Authors: Antonio Artur Moura, Napoleão Nepomuceno, Vasco Furtado
DFRWS USA 2024
Abstract
This paper introduces an approach that supports speaker identification in criminal investigations, specifically addressing challenges associated with large volumes of audio recordings featuring unknown speaker identities. Our approach clusters related recordings – potentially from the same person – based on representative voice embeddings extracted using the ECAPA-TDNN speaker recognition model. Grouping audio recordings from the same person enhances variability and richness in voice patterns, thereby improving confidence in automatic speaker recognition. We propose a combination of cosine similarity and a rank-based adjustment function to determine matches of audio clusters with individuals in an enrollment database. Our approach was validated through experiments on a Common Voice-based synthesized dataset and a real-life application involving cell phones seized in prisons, which contained thousands of conversational audio recordings. Results demonstrated satisfactory performance and stability, consistently reducing the pool of candidate speakers for subsequent analysis by a human investigator.