Challenging margin-based speaker embedding extractors by using the   variational information bottleneck

Themos Stafylakis; Anna Silnova; Johan Rohdin; Oldrich Plchot; Lukas; Burget

arXiv:2406.12622·eess.AS·June 19, 2024·Interspeech

Challenging margin-based speaker embedding extractors by using the variational information bottleneck

Themos Stafylakis, Anna Silnova, Johan Rohdin, Oldrich Plchot, Lukas, Burget

PDF

Open Access

TL;DR

This paper explores using the variational information bottleneck to challenge margin-based speaker embedding extractors, proposing a probabilistic approach that implicitly reduces the speaker posterior and achieves competitive recognition results.

Contribution

It introduces a novel probabilistic framework using the variational information bottleneck to improve speaker embedding extraction, challenging existing margin-based methods.

Findings

01

Achieves competitive results with state-of-the-art margin-based losses.

02

Demonstrates the effectiveness of the variational information bottleneck in speaker recognition.

03

Provides insights into probabilistic modeling for speaker embeddings.

Abstract

Speaker embedding extractors are typically trained using a classification loss over the training speakers. During the last few years, the standard softmax/cross-entropy loss has been replaced by the margin-based losses, yielding significant improvements in speaker recognition accuracy. Motivated by the fact that the margin merely reduces the logit of the target speaker during training, we consider a probabilistic framework that has a similar effect. The variational information bottleneck provides a principled mechanism for making deterministic nodes stochastic, resulting in an implicit reduction of the posterior of the target speaker. We experiment with a wide range of speaker recognition benchmarks and scoring methods and report competitive results to those obtained with the state-of-the-art Additive Angular Margin loss.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing