Adrien Larbanet
Jonas Lerebours
Jean Pierre David

Abstract

Internet traffic monitoring is an increasingly challenging task because of the high bandwidths, especially at Internet Service Provider routers and/or Internet backbones. We propose a parallel implementation of the max-hashing algorithm that enables the detection of millions of referenced files by deep packet inspection over high bandwidth connections. We also propose a method to extract high-entropy signatures from MP4 files compatible with the max-hashing algorithm in order to have low false positive rates. The system first computes a set of fingerprints, which are small subsets of the referenced files a priori unique and easily identifiable. At detection time, the max-hashing algorithm eliminates the need to reconstruct the flows. A Graphics Processing Unit (GPU) card computes the fingerprints of all the IP packets in parallel and searches for hits in the onboard collection of fingerprints. Our application, dedicated to the detection of known MP4 video files, enables the detection of millions of fingerprints and demonstrates a sustained processing rate of 50 Gbps per card. Furthermore, a null false positive rate was observed for our 28.25 GB transfer test. The proposed implementation also features the detection of suspect flows based on IP addresses and ports in order to carry out deeper investigations off line. © 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/ 4.0/).