Authors: Sean McKeown, Peter Aaby, Andreas Steyven



The automated comparison of visual content is a contemporary solution to scale the detection of illegal media and extremist material, both for detection on individual devices and in the cloud. However, the problem is difficult, and perceptual similarity algorithms often have weaknesses and anomalous edge cases that may not be clearly documented. Additionally, it is a complex task to perform an evaluation of such tools in order to best utilise them. To address this, we present PHASER, a still-image perceptual hashing framework enabling forensics specialists and scientists to conduct experiments on bespoke datasets for their individual deployment scenarios. The framework utilises a modular approach, allowing users to specify and define a perceptual hash/image transform/distance metric triplet, which can be explored to better understand their behaviour and interactions. PHASER is open-source and we demonstrate its utility via case studies which briefly explore setting an appropriate dataset size and the potential to optimise the performance of existing algorithms by utilising learned weight vectors for comparing hashes.