Efficient Multilingual ASR Finetuning via LoRA Language Experts

Jiahong Li; Yiwen Shao; Jianheng Zhuo; Chenda Li; Liliang Tang; Dong Yu; Yanmin Qian

arXiv:2506.21555·cs.CL·June 30, 2025

Efficient Multilingual ASR Finetuning via LoRA Language Experts

Jiahong Li, Yiwen Shao, Jianheng Zhuo, Chenda Li, Liliang Tang, Dong Yu, Yanmin Qian

PDF

Open Access

TL;DR

This paper introduces a LoRA-based finetuning framework for multilingual ASR that improves recognition accuracy by effectively managing language interference, achieving significant relative performance gains over standard methods.

Contribution

It presents a novel LoRA expert fusion and knowledge distillation approach for efficient multilingual ASR finetuning, addressing the curse of multilinguality.

Findings

01

Achieves approximately 10% relative performance gain in language-aware scenarios

02

Achieves approximately 15% relative performance gain in language-agnostic scenarios

03

Demonstrates effectiveness on Whisper-based multilingual ASR models

Abstract

Recent advancements in deep learning have significantly enhanced multilingual automatic speech recognition (ASR) due to the development of advanced model architectures and available large-scale multilingual datasets. Despite that, multilingual ASR still suffers from the curse of multilinguality in that different languages tend to interfere with each other, making it difficult for the ASR model to identify multiple languages effectively while sharing model capacity across them. This paper proposes an efficient finetuning framework for customized multilingual ASR via prepared LoRA language experts based on Whisper. Through LoRA expert fusion or knowledge distillation, our approach achieves better recognition performance on target languages than standard fine-tuning methods. Experimental results demonstrate that the proposed models yield approximately 10\% and 15\% relative performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Emotion and Mood Recognition