Unsupervised Representation Learning for Speaker Recognition via   Contrastive Equilibrium Learning

Sung Hwan Mun; Woo Hyun Kang; Min Hyun Han; Nam Soo Kim

arXiv:2010.11433·eess.AS·October 23, 2020·1 cites

Unsupervised Representation Learning for Speaker Recognition via Contrastive Equilibrium Learning

Sung Hwan Mun, Woo Hyun Kang, Min Hyun Han, Nam Soo Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces Contrastive Equilibrium Learning (CEL), an unsupervised method for speaker recognition that improves embedding quality by balancing uncertainty and discriminability, outperforming existing systems.

Contribution

The paper presents a novel unsupervised learning approach, CEL, which enhances speaker embeddings by combining uniformity and contrastive similarity losses, leading to state-of-the-art results.

Findings

01

CEL outperforms existing unsupervised speaker verification systems.

02

Pre-training with CEL improves supervised speaker embedding performance.

03

Achieved 8.01% and 4.01% EER on VoxCeleb1 and VOiCES datasets.

Abstract

In this paper, we propose a simple but powerful unsupervised learning method for speaker recognition, namely Contrastive Equilibrium Learning (CEL), which increases the uncertainty on nuisance factors latent in the embeddings by employing the uniformity loss. Also, to preserve speaker discriminability, a contrastive similarity loss function is used together. Experimental results showed that the proposed CEL significantly outperforms the state-of-the-art unsupervised speaker verification systems and the best performing model achieved 8.01% and 4.01% EER on VoxCeleb1 and VOiCES evaluation sets, respectively. On top of that, the performance of the supervised speaker embedding networks trained with initial parameters pre-trained via CEL showed better performance than those trained with randomly initialized parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msh9184/contrastive-equilibrium-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing