Semi-Supervised Contrastive Learning with Generalized Contrastive Loss   and Its Application to Speaker Recognition

Nakamasa Inoue; Keita Goto

arXiv:2006.04326·eess.AS·June 9, 2020·29 cites

Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition

Nakamasa Inoue, Keita Goto

PDF

Open Access

TL;DR

This paper presents a semi-supervised contrastive learning framework using a generalized contrastive loss that unifies supervised and unsupervised learning, applied effectively to speaker verification tasks.

Contribution

It introduces a generalized contrastive loss that seamlessly integrates supervised and unsupervised learning for speaker recognition.

Findings

01

GCL enables learning speaker embeddings in supervised, semi-supervised, and unsupervised modes.

02

The framework improves speaker verification performance on VoxCeleb dataset.

03

Unified loss function simplifies semi-supervised learning implementation.

Abstract

This paper introduces a semi-supervised contrastive learning framework and its application to text-independent speaker verification. The proposed framework employs generalized contrastive loss (GCL). GCL unifies losses from two different learning frameworks, supervised metric learning and unsupervised contrastive learning, and thus it naturally determines the loss for semi-supervised learning. In experiments, we applied the proposed framework to text-independent speaker verification on the VoxCeleb dataset. We demonstrate that GCL enables the learning of speaker embeddings in three manners, supervised learning, semi-supervised learning, and unsupervised learning, without any changes in the definition of the loss function.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing