Unveiling Language Routing Isolation in Multilingual MoE Models for Interpretable Subnetwork Adaptation
Kening Zheng, Wei-Chieh Huang, Jiahao Huo, Zhonghao Li, Henry Peng Zou, Yibo Yan, Xin Zou, Jungang Li, Junzhuo Li, Hanrong Zhang, Xuming Hu, Philip S. Yu

TL;DR
This paper analyzes expert routing patterns in multilingual MoE models, revealing language-specific subnetworks and proposing RISE to enhance low-resource language performance by exploiting routing isolation.
Contribution
It introduces the concept of Language Routing Isolation, provides a layer-wise analysis of routing patterns, and proposes RISE to improve multilingual model adaptation.
Findings
Routing isolation leads to disjoint expert sets for different languages.
RISE improves low-resource language F1 scores by up to 10.85%.
Layer-wise routing patterns show convergence-divergence across depth.
Abstract
Mixture-of-Experts (MoE) models exhibit striking performance disparities across languages, yet the internal mechanisms driving these gaps remain poorly understood. In this work, we conduct a systematic analysis of expert routing patterns in MoE models, revealing a phenomenon we term Language Routing Isolation, in which high- and low-resource languages tend to activate largely disjoint expert sets. Through layer-stratified analysis, we further show that routing patterns exhibit a layer-wise convergence-divergence pattern across model depth. Building on these findings, we propose RISE (Routing Isolation-guided Subnetwork Enhancement), a framework that exploits routing isolation to identify and adapt language-specific expert subnetworks. RISE applies a tripartite selection strategy, using specificity scores to identify language-specific experts in shallow and deep layers and overlap scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
