Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models
Elie Antoine, Fr\'ed\'eric B\'echet, Philippe Langlais

TL;DR
This paper examines how routers in Mixture of Experts models assign tokens based on Part-of-Speech tags, revealing expert specialization and high POS predictability across different architectures.
Contribution
It provides the first comprehensive analysis of POS-based routing behavior in MoE models, demonstrating expert specialization and the predictive power of routing paths.
Findings
Experts specialize in specific POS categories.
Routing paths accurately predict POS tags.
POS-based routing is consistent across multiple MoE architectures.
Abstract
This study investigates the behavior of model-integrated routers in Mixture of Experts (MoE) models, focusing on how tokens are routed based on their linguistic features, specifically Part-of-Speech (POS) tags. The goal is to explore across different MoE architectures whether experts specialize in processing tokens with similar linguistic traits. By analyzing token trajectories across experts and layers, we aim to uncover how MoE models handle linguistic information. Findings from six popular MoE models reveal expert specialization for specific POS categories, with routing paths showing high predictive accuracy for POS, highlighting the value of routing paths in characterizing tokens.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Expert finding and Q&A systems · Opinion Dynamics and Social Influence
MethodsMixture of Experts
