Authors: Frank Breitinger, Alexandre Jotterand



During the last few years, there have been numerous changes concerning datasets for digital forensics like the development of data generation frameworks or the newly released CFReDS website by NIST. In addition, it becomes mandatory (e.g., by funding agencies) to share datasets and publish them in a manner that they can be found and processed. The core of this article is a novel taxonomy that should be used to structure the data commonly used in the domain, complementing the existing methods. Based on the taxonomy, we discuss that it is not always necessary to release the dataset, e.g., in the case of random data. In addition, we address the legal aspects of sharing data. Lastly, as a minor contribution, we provide a separation of the terms structured, semi-structured, and unstructured data where there is currently no consent in the community.