Authors: Mohit Sewak (Microsoft), Sanjay K. Sahay (BITS Pilani), Hemant Rathore (BITS Pilani)
DFRWS APAC 2022
Abstract
Neural Architecture Search (NAS) desired to bring Machine Learning to the common masses. But iron- ically, because of its high-resources requirements, it remained exclusive to the elite. After several effi- ciency enhancements, its most efficient version (ENAS), found a place across some commonly used Deep Learning libraries, but it still could not gain mass popularity. Especially in the field of malware forensics, there exists no popular implementation of NAS. AutoML, as it stands today, comprises NAS and hyper- parameter tuning as sub-domains. But both from effort and impact perspectives, the data dimension has 80% weight in an ML problem, but still, the data dimension of ML is currently missing from AutoML. In forensics, optimal sample discovery may have more impact than an optimal model discovery. Therefore, in this paper, we propose Neural Sample Search (NSS) using DRo, to comprise the data discovery dimension in AutoML. Further, we prove that, for malware forensics, NSS outperforms all expert-curated and NAS-suggested models by an exceptionally large margin. This gains further significance, as the baseline expert model had over 6700% higher neural inference complexity than the NSS model, and was curated with efforts of several forensic experts across several years to reach that performance level; and the Efficient-NAS model had (ironically) over 100,000% higher neural inference complexity than the proposed NSS mechanism. With such high performance at such minimal model footprint and complexity that NSS brings, we can claim that by including NSS, AutoML can truly be ready for mass adoption in the field of malware forensics.