MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale

Tobias Falke; Nicolas Anastassacos; Samson Tan; Chankrisna Richy Meas; Chandana Satya Prakash; Nitesh Sekhar; M Saiful Bari; Krishna Kompella; Gamaleldin F. Elsayed

arXiv:2604.07030·cs.LG·April 9, 2026

MoE Routing Testbed: Studying Expert Specialization and Routing Behavior at Small Scale

Tobias Falke, Nicolas Anastassacos, Samson Tan, Chankrisna Richy Meas, Chandana Satya Prakash, Nitesh Sekhar, M Saiful Bari, Krishna Kompella, Gamaleldin F. Elsayed

PDF

TL;DR

The paper introduces the MoE Routing Testbed, a new setup for analyzing expert specialization and routing in sparse Mixture-of-Experts models at small scale, with implications for large-scale LLMs.

Contribution

It proposes a testbed that clarifies routing dynamics and expert specialization, enabling better evaluation of MoE routing techniques.

Findings

01

Balancing scope is key to expert specialization and utilization.

02

The testbed provides a clear upper bound for routing performance.

03

Observations at small scale generalize to models 35x larger.

Abstract

Sparse Mixture-of-Experts (MoE) architectures are increasingly popular for frontier large language models (LLM) but they introduce training challenges due to routing complexity. Fully leveraging parameters of an MoE model requires all experts to be well-trained and to specialize in non-redundant ways. Assessing this, however, is complicated due to lack of established metrics and, importantly, many routing techniques exhibit similar performance at smaller sizes, which is often not reflective of their behavior at large scale. To address this challenge, we propose the MoE Routing Testbed, a setup that gives clearer visibility into routing dynamics at small scale while using realistic data. The testbed pairs a data mix with clearly distinguishable domains with a reference router that prescribes ideal routing based on these domains, providing a well-defined upper bound for comparison. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.