Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

Jing Xu; Minglin Wu; Xixin Wu; Helen Meng

arXiv:2406.14092·cs.CL·August 25, 2025

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

Jing Xu, Minglin Wu, Xixin Wu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces methods to adapt self-supervised speech models to new languages efficiently while preserving their original multilingual capabilities, demonstrated through improved Mandarin speech synthesis performance.

Contribution

It presents novel adaptation and preservation strategies using LoRA and data re-clustering for extending SSL models to new languages without losing existing language skills.

Findings

01

Enhanced speech synthesis quality for Mandarin with MOS increased by 1.6

02

WER reduced by up to 61.72%

03

Preservation strategies maintain performance on original languages

Abstract

Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose adaptation methods which integrate LoRA to existed SSL models to extend new language. We also develop preservation strategies which include data combination and re-clustering to retain abilities on existed languages. Applied to mHuBERT, we investigate their effectiveness on speech re-synthesis task. Experiments show that our adaptation methods enable mHuBERT to be applied to a new language (Mandarin) with MOS value increased about 1.6 and the relative value of WER reduced up to 61.72%. Also, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling