An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition
Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

TL;DR
This paper studies the effectiveness of distribution alignment methods in multi-genre speaker recognition, revealing that current approaches have limited and inconsistent improvements across different genre distributions.
Contribution
It provides a comprehensive analysis of mainstream distribution alignment techniques on multi-genre data, highlighting their limitations and the need for more effective solutions.
Findings
Within-between distribution alignment (WBDA) performs relatively better
None of the methods consistently improve performance across all cases
Distribution alignment alone may not fully solve multi-genre recognition challenges
Abstract
Multi-genre speaker recognition is becoming increasingly popular due to its ability to better represent the complexities of real-world applications. However, a major challenge is the significant shift in the distribution of speaker vectors across different genres. While distribution alignment is a common approach to address this challenge, previous studies have mainly focused on aligning a source domain with a target domain, and the performance of multi-genre data is unknown. This paper presents a comprehensive study of mainstream distribution alignment methods on multi-genre data, where multiple distributions need to be aligned. We analyze various methods both qualitatively and quantitatively. Our experiments on the CN-Celeb dataset show that within-between distribution alignment (WBDA) performs relatively better. However, we also found that none of the investigated methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
