Authors: Xiaoyu Du (University College Dublin), Chris Hargreaves (University of Oxford), John Sheppard (Waterford Institute of Technology), and Mark Scanlon (University College Dublin)
DFRWS APAC 2021
Abstract
Best Student Paper – APAC 2021
Digital forensic test images are commonly used across a variety of digital forensic use cases including education and training, tool testing and validation, proficiency testing, malware analysis, and research and development. Using real digital evidence for these purposes is often not viable or permissible, especially when factoring in the ethical and in some cases legal considerations of working with individuals’ personal data. Furthermore, when using real data it is not usually known what actions were performed when, i.e. what was the ’ground truth’. The creation of synthetic digital forensic test images typically involves an arduous, time- consuming process of manually performing a list of actions, or following a ‘story’ to generate artefacts in a subsequently imaged disk. Besides the manual effort and time needed in executing the relevant actions in the scenario, there is often little room to build a realistic volume of non-pertinent wear-and-tear or ‘background noise’ on the suspect device, meaning the resulting disk images are inherently limited and to a certain extent simplistic.
This work presents the TraceGen framework, an automated system focused on the emulation of user actions to create realistic and comprehensive artefacts in an auditable and reproducible manner. The framework consists of a series of actions contained within scripts that are executed both externally and internally to a target virtual machine. These actions use existing automation APIs to emulate a real user’s behaviour on a Windows system to generate realistic and comprehensive artefacts. These actions can be quickly scripted together to form complex stories or to emulate wear-and-tear on the test image. In addition to the development of the framework, evaluation is also performed in terms of the ability to produce background artefacts at scale, and also the realism of the artefacts compared with their human-generated counterparts.