Chia-Ching Fang
Shih-Hao Weng


Background and Related Research - In malware research, threat hunting and security intelligence exchanging, hashes, such as MD5 or SHA256, take a dominant position. Malware researchers search malware on VirusTotal with hashes, exchange security intelligence with IoC (incident of compromise) that include hashes. However, hashes have some characteristics, such as one-to-one relationship between file and its hash, this limit researchers to do files correlation. Of course that isn’t what hashes was made for. Because of that, some other related "enhanced” hashes have been proposed, such as ssdeep, sdhash, TLSH, and imphash, and they help to learn the similarity of binary files. All of them is calculated from binary point of view, and there are the other methodologies to learn executable files similarity which are from graph point of view. For example, Zynamics bindiff takes a bigger picture of view of executable to learn the similarity/ difference of two executable files. It give researchers very detail information about what similarity in which parts of two executable files, however, it could process two files in the same time. This research, graph hash, tries to combine the advantages of these two types of methodologies, to calculate the hash of executable files from graph view, and it helps to classify malware with consistent and efficient way.

*** The Comparison among Hash, Fuzzy Hash, and Graphic Binary Diff - This will introduce hash, fuzzy hash and graphic binary diff, and what are the advantages and disadvantages among them.

*** What is Graph Hash - This will detail what graph hash is and how to calculate the graph hash from ground up.

*** Benefit of Graph Hash - This will discuss the advantages of graph hash and how to use it in real situations.

*** Live Demo - This will demonstrate how graph hash applies to malware classification.

*** Testing Results - This will presents the testing results of about one million malware samples.

*** The Limitations of Graph Hash - This will discuss the limitations of graph hash in some conditions.