Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning
Th\'eo Lepage, R\'eda Dehak

TL;DR
This paper proposes a self-supervised learning approach for speaker verification that leverages information maximization and contrastive learning to produce robust speaker embeddings without extensive labeled data.
Contribution
It introduces a novel self-supervised framework combining information maximization and contrastive learning for speaker verification, reducing reliance on labeled datasets.
Findings
Achieves competitive results with existing methods.
Outperforms supervised baseline when fine-tuned with limited labeled data.
Effective data augmentation enhances embedding robustness.
Abstract
State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing that they achieve better performance when combined with a contrastive loss. Furthermore, we conduct experiments to show that our method reaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
