Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Theo Lepage; Reda Dehak

arXiv:2501.17772·eess.AS·July 28, 2025

Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling

Theo Lepage, Reda Dehak

PDF

Open Access 1 Repo

TL;DR

This paper introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped positive sampling method for SSL in speaker verification, which improves performance and robustness by selecting diverse positives close to anchors in representation space.

Contribution

The paper proposes SSPS, a novel bootstrapped positive sampling strategy that enhances SSL frameworks for speaker verification by reducing channel bias and intra-class variance.

Findings

01

SSPS improves SSL speaker verification performance on VoxCeleb benchmarks.

02

SimCLR with SSPS achieves 2.57% EER, comparable to DINO.

03

SSPS reduces intra-class variance and channel information in speaker representations.

Abstract

Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

theolepage/sslsv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Bitcoin Customer Service Number +1-833-534-1729 · Attention Is All You Need · Linear Layer · Normalized Temperature-scaled Cross Entropy Loss · Multi-Head Attention · Max Pooling · Layer Normalization · Softmax · Convolution