End-to-End Residual CNN with L-GM Loss Speaker Verification System

Xuan Shi; Xingjian Du; Mengyao Zhu

arXiv:1805.00645·cs.SD·September 5, 2018

End-to-End Residual CNN with L-GM Loss Speaker Verification System

Xuan Shi, Xingjian Du, Mengyao Zhu

PDF

Open Access

TL;DR

This paper introduces an end-to-end speaker verification system using a Residual CNN trained with a large-margin Gaussian Mixture loss, achieving significant accuracy improvements over traditional methods.

Contribution

The novel integration of Residual CNN architecture with a large-margin Gaussian Mixture loss function for end-to-end speaker verification.

Findings

01

Over 10% accuracy improvement over DNN-based i-vector baseline

02

Effective feature extraction with ResNet architecture

03

Enhanced verification performance due to large-margin Gaussian Mixture loss

Abstract

We propose an end-to-end speaker verification system based on the neural network and trained by a loss function with less computational complexity. The end-to-end speaker verification system in this paper consists of a ResNet architecture to extract features from utterance, then produces utterance-level speaker embeddings, and train using the large-margin Gaussian Mixture loss function. Influenced by the large-margin and likelihood regularization, large-margin Gaussian Mixture loss function benefits the speaker verification performance. Experimental results demonstrate that the Residual CNN with large-margin Gaussian Mixture loss outperforms DNN-based i-vector baseline by more than 10% improvement in accuracy rate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing