Authors: Gaëtan Michelet, Janine Schneider, Aruna Withanage and Frank Breitinger
DFRWS EU 2026
Abstract
Large language models (LLMs), including systems such as ChatGPT, are increasingly examined for their role in digital forensics. Current research not only surveys their potential applications but also investigates how fine-tuning and model adaptation can enhance performance on specialized forensic tasks. However, the understandability and interpretability of the results (outputs) reduce their operational and legal usability. Recently, a new class of reasoning language models has emerged, designed to handle logic- based tasks through an ‘internal reasoning’ mechanism. Yet, users typically only see the final answer, not the underlying reasoning. One of these reasoning models is gpt-oss, which can be deployed locally, providing full access to its underlying reasoning process. This article presents the first investigation into the potential of reasoning language models for digital forensics. Four test use cases are examined to assess the usability of the reasoning component in supporting results understandability. The evaluation combines a new quantitative metric with qualitative analysis. Findings show that the reasoning component aids in understanding, interpreting, and validating LLM outputs in digital forensics at medium reasoning levels, but the support is often limited, and higher reasoning levels do not enhance response quality.