Phonetic Richness for Improved Automatic Speaker Verification
Nicholas Klein, Ganesh Sivaraman, Elie Khoury

TL;DR
This paper introduces a phonetic richness measure that improves speaker verification accuracy by better estimating utterance quality, especially for short or challenging recordings.
Contribution
It proposes a novel phonetic richness metric based on histograms, enhancing calibration and accuracy in speaker verification systems.
Findings
Positive correlation between phonetic richness and verification scores
Achieved 5.8% relative EER reduction on Voxceleb1
Improved calibration for short, repeated-word utterances
Abstract
When it comes to authentication in speaker verification systems, not all utterances are created equal. It is essential to estimate the quality of test utterances in order to account for varying acoustic conditions. In addition to the net-speech duration of an utterance, it is observed in this paper that phonetic richness is also a key indicator of utterance quality, playing a significant role in accurate speaker verification. Several phonetic histogram based formulations of phonetic richness are explored using transcripts obtained from an automatic speaker recognition system. The proposed phonetic richness measure is found to be positively correlated with voice authentication scores across evaluation benchmarks. Additionally, the proposed measure in combination with net speech helps in calibrating the speaker verification scores, obtaining a relative EER improvement of 5.8% on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
