Design Guidelines for Inclusive Speaker Verification Evaluation Datasets
Wiebke Toussaint Hutiri, Lauriane Gorce, Aaron Yi Ding

TL;DR
This paper introduces design guidelines and an algorithm for creating inclusive, unbiased speaker verification datasets, addressing current evaluation shortcomings and improving fairness across diverse speaker demographics.
Contribution
It proposes a schema for grading utterance pair difficulty and an algorithm for generating inclusive datasets, validated on VoxCeleb1.
Findings
Difficulty grading impacts evaluation variability.
Number of utterance pairs per speaker affects performance.
Inclusive dataset design enhances fairness in SV evaluation.
Abstract
Speaker verification (SV) provides billions of voice-enabled devices with access control, and ensures the security of voice-driven technologies. As a type of biometrics, it is necessary that SV is unbiased, with consistent and reliable performance across speakers irrespective of their demographic, social and economic attributes. Current SV evaluation practices are insufficient for evaluating bias: they are over-simplified and aggregate users, not representative of real-life usage scenarios, and consequences of errors are not accounted for. This paper proposes design guidelines for constructing SV evaluation datasets that address these short-comings. We propose a schema for grading the difficulty of utterance pairs, and present an algorithm for generating inclusive SV datasets. We empirically validate our proposed method in a set of experiments on the VoxCeleb1 dataset. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing
