Authors: Dennis Wolf, Lena Voigt, Harald Baier
DFRWS USA 2026
Abstract
USB devices are widely used, versatile, and inherently trusted by the USB host, hence allowing them to subtly bypass advanced security measures. This might be a reason why USB-based attacks have become more sophisticated and numerous over the last years and target for instance highly sensitive areas, like power plants. These attacks need to be investigated, which is often done in a post-mortem analysis of the disk images. The digital forensic community agrees on the importance of digital forensic datasets (e.g., for training and education of experts) and that the dynamic change and general complexity of our IT landscape makes the automatic generation of them indispensable.
Hence, in this paper our first goal is to spot the actual pervasiveness of USB-related artifacts in public datasets by reviewing the availability of synthetic Windows disk images in a semi-automatic way. Windows is chosen in this work, due to its large market share and general availability of datasets. We reveal two limitations regarding USB drive involvement in forensic scenarios: First, disk images for scenarios containing USB drives are difficult to obtain, as most scenarios do not explicitly mention the involvement of removable media — either because it is insignificant to the scenario or because its omission is intentional as part of the challenge design. Second, USB connection artifacts are unrealistic, particularly in contemporary Windows environments. For instance, among the 14 disk images in our dataset using Windows 10 or later, only five contained any traces of USB connections. Moreover, the extent of these traces was limited: only a single contemporary Windows image includes artifacts from more than two different USB storage devices. Our assumption is that lack of automation leads to such unrealistic datasets.
We approach this problem by emulating USB drives on Raspberry Pi hardware, since this leaves us with the possibility to configure the USB identifiers, relevant for a forensic investigation. Furthermore, we examine the commercially available USB Rubber Ducky in this context, for it provides similar functionality. Both solutions are then integrated into a state-of-the-art data synthesis framework to enable the generation of versatile datasets, as demonstrated in two different scenarios. In our subsequent analysis of the generated data, we show that the expected traces were successfully created, making our solution a valid option for easily generating USB traces in future datasets.