Leveraging Language Information for Target Language Extraction

Mehmet Sinan Y{\i}ld{\i}r{\i}m; Ruijie Tao; Wupeng Wang; Junyi Ao; Haizhou Li

arXiv:2511.01652·eess.AS·November 4, 2025

Leveraging Language Information for Target Language Extraction

Mehmet Sinan Y{\i}ld{\i}r{\i}m, Ruijie Tao, Wupeng Wang, Junyi Ao, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a new end-to-end framework that leverages speech pre-trained models to improve target language extraction from multilingual audio mixtures, demonstrating significant performance gains.

Contribution

It proposes a novel approach that uses language knowledge from pre-trained models to enhance extraction accuracy, and provides the first multilingual dataset for this task.

Findings

01

Achieves over 1.2 dB SI-SNR improvement for English and German extraction.

02

Constructs the first publicly available multilingual dataset for Target Language Extraction.

03

Demonstrates the effectiveness of language knowledge guidance in speech extraction.

Abstract

Target Language Extraction aims to extract speech in a specific language from a mixture waveform that contains multiple speakers speaking different languages. The human auditory system is adept at performing this task with the knowledge of the particular language. However, the performance of the conventional extraction systems is limited by the lack of this prior knowledge. Speech pre-trained models, which capture rich linguistic and phonetic representations from large-scale in-the-wild corpora, can provide this missing language knowledge to these systems. In this work, we propose a novel end-to-end framework to leverage language knowledge from speech pre-trained models. This knowledge is used to guide the extraction model to better capture the target language characteristics, thereby improving extraction quality. To demonstrate the effectiveness of our proposed approach, we construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing