Authors: Anthony Andreoli, Anis Lounis, Mourad Debbabi, Aiman Hanna



Software Supply Chain Attacks (SSCAs) typically compromise hosts through trusted but infected software. The intent of this paper is twofold: First, we present an empirical study of the most prominent software supply chain attacks and their characteristics. Second, we propose an investigative framework for identifying, expressing, and evaluating characteristic behaviours of newfound attacks for mitigation and future defense purposes. We hypothesize that these behaviours are statistically malicious, existed in the past, and thus could have been thwarted in modernity through their cementation x-years ago. To measure this, a large scale ground-truth corpus of over 10 million functions is assembled from three file classes: malware, benign, and Windows 10 binaries. An expressive query system is proposed for matching behaviours on top of semantic graphs constructed with data-flow, control-flow and ASTannotations. We leverage conditional probabilities to assess malicious intent by considering the SSCA behaviours matched in the three file classes. Our analysis reveals that the presence of an SSCA behaviour within a binary indicates malware with 86e100% probability. We also annotate each SSCA behaviour with context information that, when applied as a filter over matched dataset samples, is found to boost malicious intent by up to 30%. In addition, we present a novel data-flow metric, parametric momentum, which is a powerful gauge of malicious intent that alone matches 12.71% of malware with zero false positives. Finally, we perform a temporal analysis of the SSCA behaviours present in our dataset and discover that they have been available for 13e21 years prior to each attack; conceivably enough time to be identified for mitigating the SSCA instances.