Speaker-IPL: Unsupervised Learning of Speaker Characteristics with   i-Vector based Pseudo-Labels

Zakaria Aldeneh; Takuya Higuchi; Jee-weon Jung; Li-Wei Chen; Stephen; Shum; Ahmed Hussen Abdelaziz; Shinji Watanabe; Tatiana Likhomanenko,; Barry-John Theobald

arXiv:2409.10791·eess.AS·January 22, 2025

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen, Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko,, Barry-John Theobald

PDF

Open Access

TL;DR

This paper demonstrates that iterative pseudo-labeling using a simple i-vector model can effectively learn speaker representations in an unsupervised manner, rivaling more complex self-supervised approaches.

Contribution

It shows that a basic i-vector model suffices for IPL in unsupervised speaker recognition, reducing reliance on complex self-supervised models.

Findings

01

i-vector based IPL achieves competitive speaker verification performance

02

Systematic analysis of components impacts on IPL effectiveness

03

Simple models can rival state-of-the-art in unsupervised speaker learning

Abstract

Iterative self-training, or iterative pseudo-labeling (IPL) -- using an improved model from the current iteration to provide pseudo-labels for the next iteration -- has proven to be a powerful approach to enhance the quality of speaker representations. Recent applications of IPL in unsupervised speaker recognition start with representations extracted from very elaborate self-supervised methods (e.g., DINO). However, training such strong self-supervised models is not straightforward (they require hyper-parameter tuning and may not generalize to out-of-domain data) and, moreover, may not be needed at all. To this end, we show that the simple, well-studied, and established i-vector generative model is enough to bootstrap the IPL process for the unsupervised learning of speaker representations. We also systematically study the impact of other components on the IPL process, which includes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing

MethodsIterative Pseudo-Labeling