Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
Xue Zhang, Yunlong Liang, Fandong Meng, Songming Zhang, Yufeng Chen, Jinan Xu, Jie Zhou

TL;DR
This paper introduces LayerMoE, a layer-wise expert allocation method for multilingual LLM expansion that reduces parameters and mitigates forgetting by leveraging language similarity across layers.
Contribution
The paper proposes a novel layer-wise expert allocation algorithm (LayerMoE) that dynamically assigns experts based on language similarity, improving efficiency and performance in multilingual LLM expansion.
Findings
Outperforms previous methods with 60% fewer experts in single-expansion.
Achieves 33.3% fewer experts in lifelong expansion.
Effectively mitigates language forgetting during expansion.
Abstract
Continually expanding new languages for existing large language models (LLMs) is a promising yet challenging approach to building powerful multilingual LLMs. The biggest challenge is to make the model continuously learn new languages while preserving the proficient ability of old languages. To achieve this, recent work utilizes the Mixture-of-Experts (MoE) architecture to expand new languages by adding new experts and avoid catastrophic forgetting of old languages by routing corresponding tokens to the original model backbone (old experts). Although intuitive, this kind of method is parameter-costly when expanding new languages and still inevitably impacts the performance of old languages. To address these limitations, we analyze the language characteristics of different layers in LLMs and propose a layer-wise expert allocation algorithm (LayerMoE) to determine the appropriate number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Machine Learning in Materials Science
