Self-Supervised Frameworks for Speaker Verification via Bootstrapped Positive Sampling
Theo Lepage, Reda Dehak

TL;DR
This paper introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped positive sampling method for SSL in speaker verification, which improves performance and robustness by selecting diverse positives close to anchors in representation space.
Contribution
The paper proposes SSPS, a novel bootstrapped positive sampling strategy that enhances SSL frameworks for speaker verification by reducing channel bias and intra-class variance.
Findings
SSPS improves SSL speaker verification performance on VoxCeleb benchmarks.
SimCLR with SSPS achieves 2.57% EER, comparable to DINO.
SSPS reduces intra-class variance and channel information in speaker representations.
Abstract
Recent developments in Self-Supervised Learning (SSL) have demonstrated significant potential for Speaker Verification (SV), but closing the performance gap with supervised systems remains an ongoing challenge. SSL frameworks rely on anchor-positive pairs, constructed from segments of the same audio utterance. Hence, positives have channel characteristics similar to those of their corresponding anchors, even with extensive data-augmentation. Therefore, this positive sampling strategy is a fundamental limitation as it encodes too much information regarding the recording source in the learned representations. This article introduces Self-Supervised Positive Sampling (SSPS), a bootstrapped technique for sampling appropriate and diverse positives in SSL frameworks for SV. SSPS samples positives close to their anchor in the representation space, assuming that these pseudo-positives belong to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Bitcoin Customer Service Number +1-833-534-1729 · Attention Is All You Need · Linear Layer · Normalized Temperature-scaled Cross Entropy Loss · Multi-Head Attention · Max Pooling · Layer Normalization · Softmax · Convolution
