Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability
Zhe Li, Man-Wai Mak

TL;DR
This paper introduces a supervised contrastive learning approach with an angular-margin loss to improve speaker discrimination in embedding space, demonstrating effectiveness on CN-Celeb dataset.
Contribution
It proposes a novel combination of contrastive loss and angular-margin loss for speaker representation learning, enhancing discrimination of unseen speakers across domains.
Findings
Improved speaker discrimination on CN-Celeb dataset.
Effective embedding space with close same-speaker pairs and distant different-speaker pairs.
Easy-to-implement contrastive learning framework with code availability.
Abstract
A great challenge in speaker representation learning using deep models is to design learning objectives that can enhance the discrimination of unseen speakers under unseen domains. This work proposes a supervised contrastive learning objective to learn a speaker embedding space by effectively leveraging the label information in the training data. In such a space, utterance pairs spoken by the same or similar speakers will stay close, while utterance pairs spoken by different speakers will be far apart. For each training speaker, we perform random data augmentation on their utterances to form positive pairs, and utterances from different speakers form negative pairs. To maximize speaker separability in the embedding space, we incorporate the additive angular-margin loss into the contrastive learning objective. Experimental results on CN-Celeb show that this new learning objective can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsContrastive Learning
