VoxWatch: An open-set speaker recognition benchmark on VoxCeleb
Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero

TL;DR
This paper introduces VoxWatch, the first public benchmark for open-set speaker recognition using VoxCeleb, analyzing how watchlist size and speech duration affect detection, and evaluating techniques like score calibration and fusion.
Contribution
It provides a systematic benchmark for OSI, quantifies the impact of watchlist size and speech duration, and evaluates the effectiveness of common techniques on this task.
Findings
Adaptive score normalization does not always improve OSI performance.
Score calibration significantly enhances detection accuracy.
Score fusion leads to notable performance improvements.
Abstract
Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
