MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale
Tobias Falke, Nicolas Anastassacos, Samson Tan, Chankrisna Richy Meas, Chandana Satya Prakash, Nitesh Sekhar, M Saiful Bari, Krishna Kompella, Gamaleldin F. Elsayed

TL;DR
The paper introduces the MoE Routing Testbed, a new setup for analyzing expert specialization and routing in sparse Mixture-of-Experts models at small scale, with implications for large-scale LLMs.
Contribution
It proposes a testbed that clarifies routing dynamics and expert specialization, enabling better evaluation of MoE routing techniques.
Findings
Balancing scope is key to expert specialization and utilization.
The testbed provides a clear upper bound for routing performance.
Observations at small scale generalize to models 35x larger.
Abstract
Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
