Discriminative Speaker Representation via Contrastive Learning with Class-Aware Attention in Angular Space
Zhe Li, Man-Wai Mak, Helen Mei-Ling Meng

TL;DR
This paper introduces a novel contrastive learning framework for speaker verification that enhances discrimination and reduces the impact of hard negatives using angular margin and class-aware attention mechanisms.
Contribution
It proposes a new contrastive learning approach with angular margin and class-aware attention to improve speaker embeddings in verification tasks.
Findings
Improved speaker discrimination in embedding space.
Enhanced robustness against hard negative samples.
Superior performance on CN-Celeb and Voxceleb1 datasets.
Abstract
The challenges in applying contrastive learning to speaker verification (SV) are that the softmax-based contrastive loss lacks discriminative power and that the hard negative pairs can easily influence learning. To overcome the first challenge, we propose a contrastive learning SV framework incorporating an additive angular margin into the supervised contrastive loss in which the margin improves the speaker representation's discrimination ability. For the second challenge, we introduce a class-aware attention mechanism through which hard negative samples contribute less significantly to the supervised contrastive loss. We also employed gradient-based multi-objective optimization to balance the classification and contrastive loss. Experimental results on CN-Celeb and Voxceleb1 show that this new learning objective can cause the encoder to find an embedding space that exhibits great…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsSupervised Contrastive Loss · Contrastive Learning
