Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores
Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

TL;DR
This paper introduces a Bayesian modeling framework to predict false alarm rates in speaker verification systems as the database size grows, addressing security concerns with confusable impostors.
Contribution
It proposes a novel performance assessment method for extrapolating speaker verification security beyond current datasets using hierarchical Bayesian modeling.
Findings
Neither i-vector nor x-vector systems are immune to increased false alarms with larger impostor databases.
The framework can predict security performance for arbitrarily large speaker datasets.
Analysis on VoxCeleb datasets demonstrates the method's applicability.
Abstract
How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks. However, the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor - at least in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
