A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li

TL;DR
This paper introduces a learnable dictionary encoding layer for end-to-end language identification, enabling joint learning of dictionaries and representations directly from data, improving accuracy over traditional pooling methods.
Contribution
It proposes a novel, end-to-end trainable dictionary encoding layer that mimics GMM i-vector mechanisms within a CNN framework for language ID.
Findings
Achieved significant error reduction compared to average pooling.
Demonstrated the effectiveness of the learnable encoding layer on NIST LRE07.
Provided a unified framework for language identification with joint dictionary learning.
Abstract
A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation. Unlike the conventional methods, our new approach provides an end-to-end learning framework, where the inherent dictionary are learned directly from the loss function. The dictionaries and the encoding representation for the classifier are learned jointly. The representation is orderless and therefore appropriate for language identification. We conducted a preliminary experiment on NIST LRE07 closed-set task, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
