Inclusive Speaker Verification with Adaptive thresholding
Navdeep Jain, Hongcheng Wang

TL;DR
This paper introduces an adaptive thresholding framework for speaker verification that accounts for gender and age, improving inclusivity by reducing false rejection rates across diverse user groups.
Contribution
It proposes a novel context-adaptive thresholding method and a concatenated gender/age detection model to enhance speaker verification fairness and performance.
Findings
Reduced FRR for specific gender groups at a fixed FAR on VoxCeleb1.
Significant FRR reduction for certain age groups on OGI Kids' Speech corpus.
Effective use of prior information and derived context improves SV system inclusivity.
Abstract
While using a speaker verification (SV) based system in a commercial application, it is important that customers have an inclusive experience irrespective of their gender, age, or ethnicity. In this paper, we analyze the impact of gender and age on SV and find that for a desired common False Acceptance Rate (FAR) across different gender and age groups, the False Rejection Rate (FRR) is different for different gender and age groups. To optimize FRR for all users for a desired FAR, we propose a context (e.g. gender, age) adaptive thresholding framework for SV. The context can be available as prior information for many practical applications. We also propose a concatenated gender/age detection model to algorithmically derive the context in absence of such prior information. We experimentally show that our context-adaptive thresholding method is effective in building a more efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
