Speaker Representation Learning via Contrastive Loss with Maximal   Speaker Separability

Zhe Li; Man-Wai Mak

arXiv:2210.16636·eess.AS·November 18, 2022

Speaker Representation Learning via Contrastive Loss with Maximal Speaker Separability

Zhe Li, Man-Wai Mak

PDF

Open Access 1 Repo

TL;DR

This paper introduces a supervised contrastive learning approach with an angular-margin loss to improve speaker discrimination in embedding space, demonstrating effectiveness on CN-Celeb dataset.

Contribution

It proposes a novel combination of contrastive loss and angular-margin loss for speaker representation learning, enhancing discrimination of unseen speakers across domains.

Findings

01

Improved speaker discrimination on CN-Celeb dataset.

02

Effective embedding space with close same-speaker pairs and distant different-speaker pairs.

03

Easy-to-implement contrastive learning framework with code availability.

Abstract

A great challenge in speaker representation learning using deep models is to design learning objectives that can enhance the discrimination of unseen speakers under unseen domains. This work proposes a supervised contrastive learning objective to learn a speaker embedding space by effectively leveraging the label information in the training data. In such a space, utterance pairs spoken by the same or similar speakers will stay close, while utterance pairs spoken by different speakers will be far apart. For each training speaker, we perform random data augmentation on their utterances to form positive pairs, and utterances from different speakers form negative pairs. To maximize speaker separability in the embedding space, we incorporate the additive angular-margin loss into the contrastive learning objective. Experimental results on CN-Celeb show that this new learning objective can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shanmon110/aamsupcon
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsContrastive Learning