Investigation of Frame Alignments for GMM-based Digit-prompted Speaker   Verification

Yi Liu; Liang He; Weiqiang Zhang; Jia Liu; Michael T. Johnson

arXiv:1710.10436·cs.SD·September 5, 2018

Investigation of Frame Alignments for GMM-based Digit-prompted Speaker Verification

Yi Liu, Liang He, Weiqiang Zhang, Jia Liu, Michael T. Johnson

PDF

Open Access

TL;DR

This paper compares DNN and HMM frame alignments in GMM-based digit-prompted speaker verification and introduces a KL divergence scoring method to enhance security against incorrect pass-phrases.

Contribution

It presents a novel content verification approach using KL divergence to improve system security with minimal computational overhead.

Findings

01

DNN and HMM alignments perform similarly in verification accuracy.

02

KL divergence scoring effectively rejects incorrect pass-phrases.

03

Proposed method enhances security without significant computational cost.

Abstract

Frame alignments can be computed by different methods in GMM-based speaker verification. By incorporating a phonetic Gaussian mixture model (PGMM), we are able to compare the performance using alignments extracted from the deep neural networks (DNN) and the conventional hidden Markov model (HMM) in digit-prompted speaker verification. Based on the different characteristics of these two alignments, we present a novel content verification method to improve the system security without much computational overhead. Our experiments on the RSR2015 Part-3 digit-prompted task show that, the DNN based alignment performs on par with the HMM alignment. The results also demonstrate the effectiveness of the proposed Kullback-Leibler (KL) divergence based scoring to reject speech with incorrect pass-phrases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing