Comparison of Multiple Features and Modeling Methods for Text-dependent   Speaker Verification

Yi Liu; Liang He; Yao Tian; Zhuzi Chen; Jia Liu and; Michael T. Johnson

arXiv:1707.04373·cs.SD·September 12, 2017·1 cites

Comparison of Multiple Features and Modeling Methods for Text-dependent Speaker Verification

Yi Liu, Liang He, Yao Tian, Zhuzi Chen, Jia Liu and, Michael T. Johnson

PDF

Open Access

TL;DR

This study compares four modeling methods for text-dependent speaker verification on the RedDots dataset, analyzing the impact of frame alignment algorithms and features, and finds that HMM-based models excel with fixed phrases, while bottleneck features are less effective in challenging scenarios.

Contribution

Introduces and compares four modeling methods for text-dependent speaker verification, analyzing the effects of frame alignment and features on performance.

Findings

01

HMM-based models perform well with fixed phrases.

02

Forward-backward algorithm benefits i-vector/HMM systems.

03

Bottleneck features do not outperform MFCCs in challenging trials.

Abstract

Text-dependent speaker verification is becoming popular in the speaker recognition society. However, the conventional i-vector framework which has been successful for speaker identification and other similar tasks works relatively poorly in this task. Researchers have proposed several new methods to improve performance, but it is still unclear that which model is the best choice, especially when the pass-phrases are prompted during enrollment and test. In this paper, we introduce four modeling methods and compare their performance on the newly published RedDots dataset. To further explore the influence of different frame alignments, Viterbi and forward-backward algorithms are both used in the HMM-based models. Several bottleneck features are also investigated. Our experiments show that, by explicitly modeling the lexical content, the HMM-based modeling achieves good results in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and Audio Processing