Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification
Yi Liu, Liang He, Weiqiang Zhang, Jia Liu, Michael T. Johnson

TL;DR
This paper compares DNN and HMM frame alignments in GMM-based digit-prompted speaker verification and introduces a KL divergence scoring method to enhance security against incorrect pass-phrases.
Contribution
It presents a novel content verification approach using KL divergence to improve system security with minimal computational overhead.
Findings
DNN and HMM alignments perform similarly in verification accuracy.
KL divergence scoring effectively rejects incorrect pass-phrases.
Proposed method enhances security without significant computational cost.
Abstract
Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
