Understanding Multilingualism in Mixture-of-Experts LLMs: Routing Mechanism, Expert Specialization, and Layerwise Steering
Yuxin Chen, Zhengzhou Cai, Xiangtian Ji, Weixiang Zhao, An Zhang, Xiang Wang, Tat-Seng Chua

TL;DR
This paper systematically analyzes how mixture-of-experts large language models process multiple languages, revealing structured routing behaviors and expert specialization patterns, and proposes a routing-guided method to improve multilingual performance.
Contribution
It uncovers the structured routing and expert specialization mechanisms in multilingual MoE models and introduces a routing-guided steering method to enhance performance.
Findings
Routing aligns with linguistic families.
Expert utilization follows layerwise patterns.
Middle layers serve as language-agnostic hubs.
Abstract
Mixture-of-Experts (MoE) architectures have shown strong multilingual capabilities, yet the internal mechanisms underlying performance gains and cross-language differences remain insufficiently understood. In this work, we conduct a systematic analysis of MoE models, examining routing behavior and expert specialization across languages and network depth. Our analysis reveals that multilingual processing in MoE models is highly structured: routing aligns with linguistic families, expert utilization follows a clear layerwise pattern, and high-resource languages rely on shared experts while low-resource languages depend more on language-exclusive experts despite weaker performance. Layerwise interventions further show that early and late MoE layers support language-specific processing, whereas middle layers serve as language-agnostic capacity hubs. Building on these insights, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems · Complex Network Analysis Techniques
