End-to-End Residual CNN with L-GM Loss Speaker Verification System
Xuan Shi, Xingjian Du, Mengyao Zhu

TL;DR
This paper introduces an end-to-end speaker verification system using a Residual CNN trained with a large-margin Gaussian Mixture loss, achieving significant accuracy improvements over traditional methods.
Contribution
The novel integration of Residual CNN architecture with a large-margin Gaussian Mixture loss function for end-to-end speaker verification.
Findings
Over 10% accuracy improvement over DNN-based i-vector baseline
Effective feature extraction with ResNet architecture
Enhanced verification performance due to large-margin Gaussian Mixture loss
Abstract
We propose an end-to-end speaker verification system based on the neural network and trained by a loss function with less computational complexity. The end-to-end speaker verification system in this paper consists of a ResNet architecture to extract features from utterance, then produces utterance-level speaker embeddings, and train using the large-margin Gaussian Mixture loss function. Influenced by the large-margin and likelihood regularization, large-margin Gaussian Mixture loss function benefits the speaker verification performance. Experimental results demonstrate that the Residual CNN with large-margin Gaussian Mixture loss outperforms DNN-based i-vector baseline by more than 10% improvement in accuracy rate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
