Ian Shiel
Stephen O'Shaughnessy


Malware analysts need to be able to accurately and swiftly predict family membership as well as to determine that a suspect file contains malicious content. Previous research has shown that fuzzy hashing can be used to determine whether a file is malicious and to cluster like files together, but it does not specifically address the problem of malware variant classification.
Existing tools such as VirusTotal maintain file and section level cryptographic hashes and ssdeep file digests but they do not maintain section-level similarity hashes or provide a means to submit similarity hashes and compare them to previously analyzed samples.

This paper presents a novel method of section-level hashing to overcome the limitations of file-level hashing. Using tools developed in Python, a framework is created to process Portable Executable (PE) files and to calculate and compare ssdeep similarity digests at both file and section levels. Experiments with executable files from known malware families are conducted using the framework. File and section level digests are used to predict malware family membership and the experiments compare the performance of both methods using precision, recall and accuracy metrics.

The results show that similarity digests can be used to classify malware in Windows Portable Executable (PE) files and that section- level hashing and comparison produces considerably better results than at file-level.