Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization
Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan

TL;DR
This paper introduces a self-supervised learning framework with a novel regularization method for speaker verification, achieving significant performance improvements without using speaker labels or negative pairs.
Contribution
It proposes a simple Siamese network with self-supervised regularization that enhances speaker representation learning without contrastive loss or negative pairs.
Findings
Achieves 23.4% relative improvement on VoxCeleb datasets.
Effective online data augmentation strategies enhance performance.
Outperforms previous self-supervised speaker verification methods.
Abstract
Training speaker-discriminative and robust speaker verification systems without speaker labels is still challenging and worthwhile to explore. In this study, we propose an effective self-supervised learning framework and a novel regularization strategy to facilitate self-supervised speaker representation learning. Different from contrastive learning-based self-supervised learning methods, the proposed self-supervised regularization (SSReg) focuses exclusively on the similarity between the latent representations of positive data pairs. We also explore the effectiveness of alternative online data augmentation strategies on both the time domain and frequency domain. With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSiamese Network
