Natural Language Processing (NLP) for Digital Forensics

Authors: Kostadin Damevski, Irfan Ahmed, Mia Mohammad Imran, Hala Ali

DFRWS USA 2023

Abstract

Digital forensics investigations often involve the analysis of text, e.g., text messages, e-mails, forum posts. Text analysis in digital forensics endeavors to reveal valuable information and undetected patterns in vast digital text data to assist investigations. This type of analysis can aid in identifying pertinent evidence, tracing suspects, and constructing a case. In addition, it can aid in discovering cyber threats and fraud by examining evidence present in emails, social media, and other forms of digital communication that are part of cyber-attacks and financial crimes.

Using modern Natural Language Processing (NLP) techniques for forensic text analysis can greatly enhance the efficiency of the analysis of text in digital forensics. For instance, NLP pre-processing techniques like tokenization, preprocessing, stemming, and named entity recognition (NER) can help to extract relevant information from unstructured digital evidence data more efficiently and effectively. NLP analysis techniques, such as clustering, text summarization, and categorization, can also help to identify patterns and relationships in text data that might otherwise be difficult to detect. Additionally, text visualization techniques such as word clouds, network visualizations, and topic modeling create meaningful visual representations of the text data, which can aid in identifying patterns and relationships in the text and make the analysis more interpretable and understandable.

Through this workshop on using NLP in forensic text analysis, participants will greatly improve their ability to extract valuable information from large amounts of digital forensics text data, which can be critical for investigations and decision-making.

Bio

Kostadin Damevski

Kostadin Damevski is an Associate Professor at the Department of Computer Science at Virginia Commonwealth University. Prior to that, he was a faculty member at the Department of Computer Science at Virginia State University and a postdoctoral research assistant at the Scientific Computing and Imaging Institute at the University of Utah. Damevski has multiple years of experience in applying NLP to software engineering datasets, including chats, developer forums, and changesets. He also has experience in creating software engineering tools that are used by numerous developers in the field. You can visit Damevski’s webpage for more information, https://damevski.github.io/

Irfan Ahmed

Irfan Ahmed is an Associate Professor at the Department of Computer Science at Virginia Commonwealth University, where he runs the Security and Forensics Engineering (SAFE) Lab, focusing on digital forensics, malware analysis, and industrial control systems. Before VCU, he was a Canizaro-Livingston Endowed Assistant Professor in Cybersecurity at the University of New Orleans (UNO), New Orleans, LA. His research group regularly publishes at the DFRWS conferences and is currently involved in developing the next DFRWS challenge, “The Troubled Elevator.” You can visit Ahmed’s webpage for more information, https://people.vcu.edu/~iahmed3/

Mia Mohammad Imran

Mia Mohammad Imran is a Ph.D. student at the Department of Computer Science at Virginia Commonwealth University. His research interests include Software Engineering, Machine Learning, and Natural Language Processing.

Hala Ali

Hala Ali is a Ph.D. student in the Department of Computer Science at Virginia Commonwealth University, Richmond, Virginia. Her research interests include Digital Forensics, Cyber-Physical Systems Security, Information Security, and IoT.

Downloads

Natural-Language-Processing-NLP-for-Digital-Forensics.pdf (Other)