Olivier Thonnard
Marc Dacier

Abstract

Collecting data related to Internet threats has now become a relatively common task for security researchers and network operators. However, the huge amount of raw data can rapidly overwhelm people in charge of analyzing such data sets. Systematic analysis procedures are thus needed to extract useful information from large traffic data sets in order to assist the analyst’s investigations. This work describes an analysis framework specifically developed to gain insights into honeynet data. Our forensics procedure aims at finding, within an attack data set, groups of network traces sharing various kinds of similar patterns. In our exploratory data analysis, we seek to design a flexible clustering tool that can be applied in a systematic way on different feature vectors characterizing the attacks. In this paper, we illustrate the application of our method by analyzing one specific aspect of the honeynet data, i.e. the time series of the attacks. We show that clustering attack patterns with an appropriate similarity measure provides very good candidates for further in-depth investigation, which can help us to discover the plausible root causes of the underlying phenomena. The results of our clustering on time series analysis enable us to identify the activities of several worms and botnets in the collected traffic.