Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors
Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, Themos~Stafylakis

TL;DR
This paper introduces a novel HMM-based i-vector approach for text-dependent speaker recognition with random digit strings, demonstrating state-of-the-art accuracy and robustness even without extensive channel compensation.
Contribution
The paper proposes a digit-specific i-vector extraction method combined with uncertainty modeling, improving speaker recognition performance on RSR2015 and RedDots datasets.
Findings
Achieves 1.52 ext{ and }1.77 ext{ EER} on RSR2015 for male and female.
Outperforms x-vector systems trained on large datasets.
State-of-the-art results obtained with a single, simple system without extensive channel compensation.
Abstract
In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52\% and 1.77\% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
