Semi-Supervised Contrastive Learning with Generalized Contrastive Loss and Its Application to Speaker Recognition
Nakamasa Inoue, Keita Goto

TL;DR
This paper presents a semi-supervised contrastive learning framework using a generalized contrastive loss that unifies supervised and unsupervised learning, applied effectively to speaker verification tasks.
Contribution
It introduces a generalized contrastive loss that seamlessly integrates supervised and unsupervised learning for speaker recognition.
Findings
GCL enables learning speaker embeddings in supervised, semi-supervised, and unsupervised modes.
The framework improves speaker verification performance on VoxCeleb dataset.
Unified loss function simplifies semi-supervised learning implementation.
Abstract
This paper introduces a semi-supervised contrastive learning framework and its application to text-independent speaker verification. The proposed framework employs generalized contrastive loss (GCL). GCL unifies losses from two different learning frameworks, supervised metric learning and unsupervised contrastive learning, and thus it naturally determines the loss for semi-supervised learning. In experiments, we applied the proposed framework to text-independent speaker verification on the VoxCeleb dataset. We demonstrate that GCL enables the learning of speaker embeddings in three manners, supervised learning, semi-supervised learning, and unsupervised learning, without any changes in the definition of the loss function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
