Geometric Routing Enables Causal Expert Control in Mixture of Experts
Ivan Ternovtsii, Yurii Bilak

TL;DR
This paper demonstrates that in sparse Mixture-of-Experts models, expert specialization is causally meaningful, interpretable, and controllable through geometric routing, enabling first-class interpretability.
Contribution
It introduces a geometric routing method that makes expert specialization directly inspectable and controllable, validated through causal interventions.
Findings
15% of experts are monosemantic with 10 categories
Routing separates tokens by frequency and syntax in different layers
Causal interventions significantly alter expert-related probabilities
Abstract
Sparse Mixture-of-Experts (MoE) models scale parameters while fixing active computation per token, but the specialization of individual experts remains opaque. In a companion paper we showed that routing topology is quality-neutral: five structurally different configurations converge to statistically equivalent language modeling quality. Here we show that expert identity is nonetheless causally meaningful: individual rank-1 experts are monosemantic by construction, and cosine-similarity routing in a low-dimensional metric space makes their specialization directly inspectable. We present four lines of evidence. First, projecting expert output vectors through the unembedding matrix yields a Semantic Dictionary: 15% of experts are monosemantic specialists spanning 10 categories (temporal, geographic, cardinal, discourse, emotional, financial, military, scientific). Second, routing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
