Phonetic Richness for Improved Automatic Speaker Verification

Nicholas Klein; Ganesh Sivaraman; Elie Khoury

arXiv:2407.08017·eess.AS·July 12, 2024

Phonetic Richness for Improved Automatic Speaker Verification

Nicholas Klein, Ganesh Sivaraman, Elie Khoury

PDF

Open Access

TL;DR

This paper introduces a phonetic richness measure that improves speaker verification accuracy by better estimating utterance quality, especially for short or challenging recordings.

Contribution

It proposes a novel phonetic richness metric based on histograms, enhancing calibration and accuracy in speaker verification systems.

Findings

01

Positive correlation between phonetic richness and verification scores

02

Achieved 5.8% relative EER reduction on Voxceleb1

03

Improved calibration for short, repeated-word utterances

Abstract

When it comes to authentication in speaker verification systems, not all utterances are created equal. It is essential to estimate the quality of test utterances in order to account for varying acoustic conditions. In addition to the net-speech duration of an utterance, it is observed in this paper that phonetic richness is also a key indicator of utterance quality, playing a significant role in accurate speaker verification. Several phonetic histogram based formulations of phonetic richness are explored using transcripts obtained from an automatic speaker recognition system. The proposed phonetic richness measure is found to be positively correlated with voice authentication scores across evaluation benchmarks. Additionally, the proposed measure in combination with net speech helps in calibrating the speaker verification scores, obtaining a relative EER improvement of 5.8% on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques