Speaker Recognition with Random Digit Strings Using Uncertainty   Normalized HMM-based i-vectors

Nooshin Maghsoodi; Hossein Sameti; Hossein Zeinali; Themos~Stafylakis

arXiv:1907.06111·eess.AS·July 16, 2019·1 cites

Speaker Recognition with Random Digit Strings Using Uncertainty Normalized HMM-based i-vectors

Nooshin Maghsoodi, Hossein Sameti, Hossein Zeinali, Themos~Stafylakis

PDF

Open Access

TL;DR

This paper introduces a novel HMM-based i-vector approach for text-dependent speaker recognition with random digit strings, demonstrating state-of-the-art accuracy and robustness even without extensive channel compensation.

Contribution

The paper proposes a digit-specific i-vector extraction method combined with uncertainty modeling, improving speaker recognition performance on RSR2015 and RedDots datasets.

Findings

01

Achieves 1.52 ext{ and }1.77 ext{ EER} on RSR2015 for male and female.

02

Outperforms x-vector systems trained on large datasets.

03

State-of-the-art results obtained with a single, simple system without extensive channel compensation.

Abstract

In this paper, we combine Hidden Markov Models (HMMs) with i-vector extractors to address the problem of text-dependent speaker recognition with random digit strings. We employ digit-specific HMMs to segment the utterances into digits, to perform frame alignment to HMM states and to extract Baum-Welch statistics. By making use of the natural partition of input features into digits, we train digit-specific i-vector extractors on top of each HMM and we extract well-localized i-vectors, each modelling merely the phonetic content corresponding to a single digit. We then examine ways to perform channel and uncertainty compensation, and we propose a novel method for using the uncertainty in the i-vector estimates. The experiments on RSR2015 part III show that the proposed method attains 1.52\% and 1.77\% Equal Error Rate (EER) for male and female respectively, outperforming state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing