Deep Speaker: an End-to-End Neural Speaker Embedding System
Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu,, Ying Cao, Ajay Kannan, Zhenyao Zhu

TL;DR
Deep Speaker introduces an end-to-end neural system for speaker embedding that significantly improves speaker verification and identification accuracy over traditional methods, using triplet loss and neural architectures.
Contribution
It presents a novel neural speaker embedding system with end-to-end training, outperforming traditional i-vector baselines in speaker recognition tasks.
Findings
Reduces verification EER by 50%
Improves identification accuracy by 60%
Adapting from Mandarin-trained models enhances English speaker recognition
Abstract
We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity. The embeddings generated by Deep Speaker can be used for many tasks, including speaker identification, verification, and clustering. We experiment with ResCNN and GRU architectures to extract the acoustic features, then mean pool to produce utterance-level speaker embeddings, and train using triplet loss based on cosine similarity. Experiments on three distinct datasets suggest that Deep Speaker outperforms a DNN-based i-vector baseline. For example, Deep Speaker reduces the verification equal error rate by 50% (relatively) and improves the identification accuracy by 60% (relatively) on a text-independent dataset. We also present results that suggest adapting from a model trained with Mandarin can improve accuracy for English speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsGated Recurrent Unit
