MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu

TL;DR
This paper introduces MOSA, a mixture of simple adapters for multilingual LLM-based ASR, which outperforms monolithic approaches by effectively sharing knowledge across languages and reducing parameter interference.
Contribution
MOSA employs a mixture of simple adapters enabling language-specific and shared learning, improving performance and parameter efficiency over traditional single-adapter methods.
Findings
MOSA-Base reduces WER by 15.4% relative to Ideal-LLM Base.
MOSA achieves 13.3% WER reduction with only 60% of the parameters.
MOSA outperforms monolithic approaches across all tested languages.
Abstract
LLM-based ASR overcomes multilingual data scarcity by projecting speech representations into the LLM space to leverage its robust semantic and reasoning capabilities. However, while previous approaches typically enhance performance by scaling data or model parameters, a single projector often struggles to effectively align representations across different languages. In this work, we propose an MoE-based projector named MOSA (Mixture of Simple Adapters). By aggregating multiple simple adapters, this architecture enables different experts to specialize in learning either language-shared or language-specific knowledge. This approach not only mitigates parameter interference between languages but also facilitates positive transfer from high-resource to low-resource languages, effectively alleviating data scarcity issues. Experimental results demonstrate that MOSA-Base achieves a 15.4%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
