Authors: Zainab Khalid, Farkhund Iqbal, Mohd Saqib

DFRWS USA 2025 — “History in the Making” — Jubilee 25th Anniversary

Abstract

Artificial Intelligence (AI) has found multi-faceted applications in critical sectors including Digital Forensics (DF) which also require eXplainability (XAI) as a non-negotiable for its applicability, such as admissibility of expert evidence in the court of law. The state-of-the-art XAI workflows focus more on utilizing XAI tools for supervised learning. This is in contrast to the fact that unsupervised learning may be practically more relevant in DF and other sectors that largely produce complex and unlabeled data continuously, in considerable volumes. This research study explores the challenges and utility of unsupervised learning-based XAI for DF’s complex datasets. A memory forensics-based case scenario is implemented to detect anomalies and cluster obfuscated malware using the Isolation Forest, Autoencoder, K-means, DBSCAN, and Gaussian Mixture Model (GMM) unsupervised algorithms on three categorical levels. The CIC MalMemAnalysis-2022 dataset’s binary, and multivariate (4, 16) categories are used as a reference to perform clustering. The anomaly detection and clustering results are evaluated using accuracy, confusion matrices and Adjusted Rand Index (ARI) and explained through Shapley Additive Explanations (SHAP), using force, waterfall, scatter, summary, and bar plots’ local and global explanations. We also explore how some SHAP explanations may be used for dimensionality reduction.

Downloads