Authors: Wen Qiaokun (University of Hong Kong) and KP Chow (University of Hong Kong)
DFRWS APAC 2021
Malware detection is always an important task in digital forensics. With the advancement of technology, malware have become more and more polymorphic. In the process of digital investigation, forensics always cannot get the entire file of the malware. For example, when conducting corporate cybersecurity forensics, because the limit length of network packages, packets capture tools established by different companies often fail to get the entire file. Otherwise, deleting files may also cause residues of malware segments. Because we even do not know which part the segment we get is, so, we cannot use much domain knowledge to do the detection. Therefore, this paper proposes to detect malwares according to very small sequence binary fragments of PE files by using a CNN-based model. Datasets especially test set are often one of the most difficult problems in zero-day malware detection, because it means that the virus has never appeared before. In this paper, we collect the data by taking advantage of the differences in anti-virus tools at different time points. And Experiments are performed on malwares of different lengths, positions, and combinations. Through experiments, we found that only a short segment is needed to achieve a relatively good accuracy. In the end, for a random piece of continuous malicious code, we achieved an accuracy of up to 0.86 when the length of continuous fragments is 60,000 bytes. For non- contiguous and unordered random pieces of malicious code, we get an accuracy of up to 0.83 using only 1024 bytes(1KB) length fragments. And when using 60,000 bytes length fragment as the baseline, we can finally receive a 0.91 accuracy.