Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

Chong Li; Yingzhuo Deng; Jiajun Zhang; Chengqing Zong

arXiv:2506.12388·cs.CL·June 17, 2025

Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model

Chong Li, Yingzhuo Deng, Jiajun Zhang, Chengqing Zong

PDF

Open Access

TL;DR

This paper introduces a dynamic mixture-of-experts approach for multilingual LLMs that groups similar languages to reduce negative transfer and improve performance with fewer parameters.

Contribution

It proposes a novel method to dynamically group and scale model parameters based on language similarity, addressing the multilinguality curse.

Findings

01

Reduces negative transfer among languages.

02

Boosts multilingual performance with fewer parameters.

03

Enhances language adaptation and inference efficiency.

Abstract

The curse of multilinguality phenomenon is a fundamental problem of multilingual Large Language Models (LLMs), where the competition between massive languages results in inferior performance. It mainly comes from limited capacity and negative transfer between dissimilar languages. To address this issue, we propose a method to dynamically group and scale up the parameters of multilingual LLM while boosting positive transfer among similar languages. Specifically, the model is first tuned on monolingual corpus to determine the parameter deviation in each layer and quantify the similarity between languages. Layers with more deviations are extended to mixture-of-experts layers to reduce competition between languages, where one expert module serves one group of similar languages. Experimental results on 18 to 128 languages show that our method reduces the negative transfer between languages…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems