Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs
Charles Ye, Bo Yuan, Lee Sharkey

TL;DR
This paper introduces a decomposition method for Mixture-of-Experts models that separates control signals from content, revealing how routing decisions lead to monosemantic expert paths and improved interpretability.
Contribution
It presents a parameter-free decomposition that uncovers the causal role of control signals in MoEs, enhancing understanding of their routing and semantic specialization.
Findings
Control signals encode an abstract function that rotates across layers.
Expert paths become monosemantic, clustering tokens by semantic function.
Clusters in control subspace are more monosemantic than in full representations.
Abstract
An LLM's residual stream is both state and instruction: it encodes the current context and determines the next transformation. We introduce a parameter-free decomposition for Mixture-of-Experts models that splits each layer's hidden state into a control signal that causally drives routing and an orthogonal content channel invisible to the router. Across six MoE architectures, we find that models preserve surface-level features (language, token identity, position) in the content channel, while the control signal encodes an abstract function that rotates from layer to layer. Because each routing decision is low-bandwidth, this hand-off forces compositional specialization across layers. While individual experts remain polysemantic, expert paths become monosemantic, clustering tokens by semantic function across languages and surface forms. The same token (e.g., ":") follows distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
