Designing Neural Speaker Embeddings with Meta Learning
Manoj Kumar, Tae Jin-Park, Somer Bishop, Shrikanth Narayanan

TL;DR
This paper introduces a meta-learning approach to train neural speaker embeddings that generalize better to unseen speakers and challenging acoustic conditions, outperforming traditional methods in speaker diarization and verification tasks.
Contribution
It reformulates speaker embedding training using meta-learning, develops an open-source toolkit, and demonstrates improved performance across multiple datasets and challenging scenarios.
Findings
Achieved up to 12.37% relative improvement in speaker error.
Meta-learning benefits in challenging acoustic conditions.
Reductions in equal error rate for speaker verification.
Abstract
Neural speaker embeddings trained using classification objectives have demonstrated state-of-the-art performance in multiple applications. Typically, such embeddings are trained on an out-of-domain corpus on a single task e.g., speaker classification, albeit with a large number of classes (speakers). In this work, we reformulate embedding training under the meta-learning paradigm. We redistribute the training corpus as an ensemble of multiple related speaker classification tasks, and learn a representation that generalizes better to unseen speakers. First, we develop an open source toolkit to train x-vectors that is matched in performance with pre-trained Kaldi models for speaker diarization and speaker verification applications. We find that different bottleneck layers in the architecture variedly favor different applications. Next, we use two meta-learning strategies, namely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Music and Audio Processing
