Voice Biometrics Security: Extrapolating False Alarm Rate via   Hierarchical Bayesian Modeling of Speaker Verification Scores

Alexey Sholokhov; Tomi Kinnunen; Ville Vestman; Kong Aik Lee

arXiv:1911.01182·eess.AS·November 5, 2019

Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores

Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

PDF

Open Access

TL;DR

This paper introduces a Bayesian modeling framework to predict false alarm rates in speaker verification systems as the database size grows, addressing security concerns with confusable impostors.

Contribution

It proposes a novel performance assessment method for extrapolating speaker verification security beyond current datasets using hierarchical Bayesian modeling.

Findings

01

Neither i-vector nor x-vector systems are immune to increased false alarms with larger impostor databases.

02

The framework can predict security performance for arbitrarily large speaker datasets.

03

Analysis on VoxCeleb datasets demonstrates the method's applicability.

Abstract

How secure automatic speaker verification (ASV) technology is? More concretely, given a specific target speaker, how likely is it to find another person who gets falsely accepted as that target? This question may be addressed empirically by studying naturally confusable pairs of speakers within a large enough corpus. To this end, one might expect to find at least some speaker pairs that are indistinguishable from each other in terms of ASV. To a certain extent, such aim is mirrored in the standardized ASV evaluation benchmarks. However, the number of speakers in such evaluation benchmarks represents only a small fraction of all possible human voices, making it challenging to extrapolate performance beyond a given corpus. Furthermore, the impostors used in performance evaluation are usually selected randomly. A potentially more meaningful definition of an impostor - at least in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing