Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
Weicheng Cai, Jinkun Chen, Ming Li

TL;DR
This paper investigates the impact of different encoding layers and loss functions on end-to-end speaker and language recognition systems, proposing novel pooling and encoding methods that improve performance on standard datasets.
Contribution
It introduces a unified, interpretable end-to-end system with novel encoding layers and loss functions for enhanced speaker and language recognition.
Findings
Self-attentive pooling improves utterance representation.
Learnable dictionary encoding enhances discriminability.
Proposed methods outperform baseline on Voxceleb and NIST LRE 07.
Abstract
In this paper, we explore the encoding/pooling layer and loss function in the end-to-end speaker and language recognition system. First, a unified and interpretable end-to-end system for both speaker and language recognition is developed. It accepts variable-length input and produces an utterance level result. In the end-to-end system, the encoding layer plays a role in aggregating the variable-length input sequence into an utterance level representation. Besides the basic temporal average pooling, we introduce a self-attentive pooling layer and a learnable dictionary encoding layer to get the utterance level representation. In terms of loss function for open-set speaker verification, to get more discriminative speaker embedding, center loss and angular softmax loss is introduced in the end-to-end system. Experimental results on Voxceleb and NIST LRE 07 datasets show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Natural Language Processing Techniques
MethodsSoftmax
