Disentangled representation learning for multilingual speaker   recognition

Kihyun Nam; Youkyum Kim; Jaesung Huh; Hee Soo Heo; Jee-weon Jung; Joon; Son Chung

arXiv:2211.00437·eess.AS·June 8, 2023

Disentangled representation learning for multilingual speaker recognition

Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee Soo Heo, Jee-weon Jung, Joon, Son Chung

PDF

Open Access

TL;DR

This paper introduces a novel disentangled learning approach for multilingual speaker recognition, addressing the challenge of recognizing speakers across different languages by disentangling language information from speaker features.

Contribution

It presents a large-scale bilingual speaker evaluation set and a new disentanglement learning method combining adversarial and metric learning techniques without manual language labels.

Findings

01

Effective disentanglement of language and speaker features.

02

Improved recognition accuracy in bilingual scenarios.

03

A new evaluation benchmark for multilingual speaker recognition.

Abstract

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario. The majority of the world's population speak at least two languages; however, most speaker recognition systems fail to recognise the same speaker when speaking in different languages. Popular speaker recognition evaluation sets do not consider the bilingual scenario, making it difficult to analyse the effect of bilingual speakers on speaker recognition performance. In this paper, we publish a large-scale evaluation set named VoxCeleb1-B derived from VoxCeleb that considers bilingual scenarios. We introduce an effective disentanglement learning strategy that combines adversarial and metric learning-based methods. This approach addresses the bilingual situation by disentangling language-related information from speaker representation while ensuring stable speaker representation learning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

Methodsfail