Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung, Li-Wei Chen, Stephen, Shum, Ahmed Hussen Abdelaziz, Shinji Watanabe, Tatiana Likhomanenko,, Barry-John Theobald

TL;DR
This paper demonstrates that iterative pseudo-labeling using a simple i-vector model can effectively learn speaker representations in an unsupervised manner, rivaling more complex self-supervised approaches.
Contribution
It shows that a basic i-vector model suffices for IPL in unsupervised speaker recognition, reducing reliance on complex self-supervised models.
Findings
i-vector based IPL achieves competitive speaker verification performance
Systematic analysis of components impacts on IPL effectiveness
Simple models can rival state-of-the-art in unsupervised speaker learning
Abstract
Iterative self-training, or iterative pseudo-labeling (IPL) -- using an improved model from the current iteration to provide pseudo-labels for the next iteration -- has proven to be a powerful approach to enhance the quality of speaker representations. Recent applications of IPL in unsupervised speaker recognition start with representations extracted from very elaborate self-supervised methods (e.g., DINO). However, training such strong self-supervised models is not straightforward (they require hyper-parameter tuning and may not generalize to out-of-domain data) and, moreover, may not be needed at all. To this end, we show that the simple, well-studied, and established i-vector generative model is enough to bootstrap the IPL process for the unsupervised learning of speaker representations. We also systematically study the impact of other components on the IPL process, which includes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing
MethodsIterative Pseudo-Labeling
