Task-Conditioned Routing Signatures in Sparse Mixture-of-Experts Transformers
Mynampati Sri Ranganadha Avinash

TL;DR
This paper introduces routing signatures in sparse MoE transformers, revealing that expert routing patterns are task-conditioned and can be used for task classification, highlighting a structured aspect of conditional computation.
Contribution
The authors propose routing signatures to analyze expert activation patterns, demonstrating their task-conditioned nature and their effectiveness in classifying tasks in sparse MoE models.
Findings
Routing signatures are highly similar for prompts from the same task.
A classifier trained on routing signatures achieves over 92% accuracy in task classification.
Task structure becomes more evident in deeper layers of the model.
Abstract
Sparse Mixture-of-Experts (MoE) architectures enable efficient scaling of large language models through conditional computation, yet the routing mechanisms responsible for expert selection remain poorly understood. In this work, we introduce routing signatures, a vector representation summarizing expert activation patterns across layers for a given prompt, and use them to study whether MoE routing exhibits task-conditioned structure. Using OLMoE-1B-7B-0125-Instruct as an empirical testbed, we show that prompts from the same task category induce highly similar routing signatures, while prompts from different categories exhibit substantially lower similarity. Within-category routing similarity (0.8435 +/- 0.0879) significantly exceeds across-category similarity (0.6225 +/- 0.1687), corresponding to Cohen's d = 1.44. A logistic regression classifier trained solely on routing signatures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
