Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Th\'eo Lepage; R\'eda Dehak

arXiv:2207.05506·eess.AS·June 25, 2025

Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning

Th\'eo Lepage, R\'eda Dehak

PDF

Open Access 1 Repo

TL;DR

This paper proposes a self-supervised learning approach for speaker verification that leverages information maximization and contrastive learning to produce robust speaker embeddings without extensive labeled data.

Contribution

It introduces a novel self-supervised framework combining information maximization and contrastive learning for speaker verification, reducing reliance on labeled datasets.

Findings

01

Achieves competitive results with existing methods.

02

Outperforms supervised baseline when fine-tuned with limited labeled data.

03

Effective data augmentation enhances embedding robustness.

Abstract

State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing that they achieve better performance when combined with a contrastive loss. Furthermore, we conduct experiments to show that our method reaches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theolepage/sslsv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing