Authors: Saed Alrabaee (Concordia University)



This work aims to develop an automatic tool that can perform the laborious and error-prone reverse engineering task of binary authorship characterization, i.e., determining clues related to the author(s) of a piece of binary code. Software code written by human programmers reflects the author’s educational background, level of expertise, and coding traits. Accordingly, these may be characterized by identifying meaningful features and examining them. Binary authorship characterization reveals information that can be extremely useful for security applications such as digital forensics, malware triage, and binary vulnerability tracking. This paper proposes a system, BinChar, that capture various aspects of author style, including code trait characteristics, code structure characteristics, and code behavior characteristics. For the purpose of detection, a Convolutional Neural Network (CNN) is used. The results generated by the CNN are evaluated more precisely using Bayesian calibration. We tested BinChar in identifying the characteristics of the authors of program binaries. Also, we applied it to almost 500 GB of malware samples provided by the Kaggle Microsoft Malware Classification Challenge, to demonstrate that BinChar is an appropriate tool for characterizing malware families. As an illustration, we report a case study in which we determine the author characteristics of the Mirai botnet and compare them with the author characteristics of 360,000 malware samples.