Bootstrap Equilibrium and Probabilistic Speaker Representation Learning   for Self-supervised Speaker Verification

Sung Hwan Mun; Min Hyun Han; Dongjune Lee; Jihwan Kim; and Nam Soo Kim

arXiv:2112.08929·eess.AS·December 28, 2021

Bootstrap Equilibrium and Probabilistic Speaker Representation Learning for Self-supervised Speaker Verification

Sung Hwan Mun, Min Hyun Han, Dongjune Lee, Jihwan Kim, and Nam Soo Kim

PDF

TL;DR

This paper introduces a self-supervised framework for speaker verification that combines bootstrap equilibrium learning with probabilistic embeddings, improving accuracy by modeling uncertainty and enhancing representation quality.

Contribution

It proposes a novel two-stage self-supervised learning approach with bootstrap equilibrium training and uncertainty-aware probabilistic embeddings for speaker verification.

Findings

01

Outperforms contrastive learning methods in speaker representation quality.

02

Significantly reduces EER and MinDCF on VoxCeleb1 dataset.

03

Effectively models data uncertainty in speaker embeddings.

Abstract

In this paper, we propose self-supervised speaker representation learning strategies, which comprise of a bootstrap equilibrium speaker representation learning in the front-end and an uncertainty-aware probabilistic speaker embedding training in the back-end. In the front-end stage, we learn the speaker representations via the bootstrap training scheme with the uniformity regularization term. In the back-end stage, the probabilistic speaker embeddings are estimated by maximizing the mutual likelihood score between the speech samples belonging to the same speaker, which provide not only speaker representations but also data uncertainty. Experimental results show that the proposed bootstrap equilibrium training strategy can effectively help learn the speaker representations and outperforms the conventional methods based on contrastive learning. Also, we demonstrate that the integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.