Forensic String Search Tool Quirks or What I Learned Testing String Search Tools

Authors: James Lyle

DFRWS USA 2019

Abstract

One might expect that if the results of an indexed search of a test image is compared to the results of a live search of the same test image by the same forensic tool the results would be the same. It turns out that this is not always the case because the two search engines may implement quite different algorithms. One would also think that the same regular expression could be used in different forensic tools, however the regular expressions accepted by one tool may differ from another tool because of different implementations of the regular expression matching algorithms. This presentation reports on experience testing several forensic string search tools. In general, the search tools do a good job finding target strings, but we report on several unexpected behaviors exhibited by the tools. The test data set included a mix of common data features likely to be encountered and a few significant, but less frequently encountered features. Target strings were placed in active files, recoverable deleted files and unallocated space. Strings were encoded in ASCII, UTF-8, UTF-16BE and UTF-16LE. Files were located in a mix of Windows (FAT, ExFAT & NTFS), Linux (ext4) and Mac (HFS & APFS) file systems. Unicode strings exhibit several interesting quirks. For example, a match to a Unicode UTF-16 string is sometimes reported as two hits (Latin-based character strings) but only one hit for other strings (non-Latin-based character strings). For one tool that offered a built-in search for social security numbers, the tool gave different results for live vs indexed search. One search engine filtered out obviously fake numbers, but the other search engine didn’t. These are just a few of the quirks discussed in this presentation.

Downloads

Forensic String Search Tool Quirks or What I Learned Testing String Search Tools (Slides)