An Investigation of Distribution Alignment in Multi-Genre Speaker   Recognition

Zhenyu Zhou; Junhui Chen; Namin Wang; Lantian Li; Dong Wang

arXiv:2309.14158·cs.SD·September 26, 2023

An Investigation of Distribution Alignment in Multi-Genre Speaker Recognition

Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang

PDF

Open Access

TL;DR

This paper studies the effectiveness of distribution alignment methods in multi-genre speaker recognition, revealing that current approaches have limited and inconsistent improvements across different genre distributions.

Contribution

It provides a comprehensive analysis of mainstream distribution alignment techniques on multi-genre data, highlighting their limitations and the need for more effective solutions.

Findings

01

Within-between distribution alignment (WBDA) performs relatively better

02

None of the methods consistently improve performance across all cases

03

Distribution alignment alone may not fully solve multi-genre recognition challenges

Abstract

Multi-genre speaker recognition is becoming increasingly popular due to its ability to better represent the complexities of real-world applications. However, a major challenge is the significant shift in the distribution of speaker vectors across different genres. While distribution alignment is a common approach to address this challenge, previous studies have mainly focused on aligning a source domain with a target domain, and the performance of multi-genre data is unknown. This paper presents a comprehensive study of mainstream distribution alignment methods on multi-genre data, where multiple distributions need to be aligned. We analyze various methods both qualitatively and quantitatively. Our experiments on the CN-Celeb dataset show that within-between distribution alignment (WBDA) performs relatively better. However, we also found that none of the investigated methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing