Authors: Carlo Jakobs, Axel Mahr, Martin Lambertz, Mariia Rybalka, Daniel Plohmann

DFRWS USA 2025 — “History in the Making” — Jubilee 25th Anniversary

Abstract

This research explores the application of bytewise approximate matching algorithms on executable files, evaluating the effectiveness of ssdeep, sdhash, TLSH and MRSHv2 across various scenarios, where approximate matching seems to be a natural tool to employ. Previous works already underlined that approximate matching is often used for tasks where the algorithms have not been thoroughly and systematically evaluated. Pagani et al., in particular, highlighted the shortcomings of previous research and tried to improve current knowledge about the applicability of approximate matching in the context of executable files by evaluating typical use cases. We extend their work by taking a closer look at further common scenarios that are not covered in their article. Specifically, we examine use cases such as different versions of the same software and comparisons between on-disk and in-memory representations of the same program, both for malicious and benign software.

Our findings reveal that the considered algorithms’ performance across all evaluated scenarios was generally unsatisfactory. Notably, they struggle with size-related and punctual modifications introduced during the loading stage. Furthermore, executables with no functional similarity may be mismatched due to shared byte-level similarity caused by embedded resources or inherent to certain programming languages or runtime environments. Consequently, these algorithms should be used cautiously and regarded as assisting tools rather than reliable methods for indicating similarity between executable files, as both false positives and false negatives can occur, and users should be aware of them.

Moreover, while some of the unfavored results stem from design decisions, we observed unexpected behavior in some experiments that we could trace back to issues in the reference implementations of the algorithms. After fixing the implementations, the strange effects in our results indeed disappeared. It is still an open question if and to what extent previous experiments and evaluations were affected by these issues.

Downloads