Challenging margin-based speaker embedding extractors by using the variational information bottleneck
Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukas, Burget

TL;DR
This paper explores using the variational information bottleneck to challenge margin-based speaker embedding extractors, proposing a probabilistic approach that implicitly reduces the speaker posterior and achieves competitive recognition results.
Contribution
It introduces a novel probabilistic framework using the variational information bottleneck to improve speaker embedding extraction, challenging existing margin-based methods.
Findings
Achieves competitive results with state-of-the-art margin-based losses.
Demonstrates the effectiveness of the variational information bottleneck in speaker recognition.
Provides insights into probabilistic modeling for speaker embeddings.
Abstract
Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
