Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

Charafeddine Mouzouni

arXiv:2604.04230·cs.LG·April 7, 2026

Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training

Charafeddine Mouzouni

PDF

TL;DR

This paper models Mixture-of-Experts token routing as a congestion game, revealing a three-phase load balancing trajectory during training that informs how router behavior evolves from balancing to prioritizing quality.

Contribution

It introduces a theoretical framework for MoE routing dynamics, capturing load balance evolution and providing diagnostics for load prediction and robustness verification.

Findings

01

Load balance peaks during early training surge phase.

02

Experts specialize under steady load in stabilization phase.

03

Router shifts from balancing to prioritizing quality in relaxation phase.

Abstract

We model Mixture-of-Experts (MoE) token routing as a congestion game with a single effective parameter, the congestion coefficient gamma_eff, that quantifies the balance-quality tradeoff. Tracking gamma_eff across training checkpoints of two open-source MoE models, OLMoE-1B-7B (20 checkpoints, with dense sampling in the surge region) and OpenMoE-8B (6 checkpoints), reveals a three-phase trajectory: a surge phase where the router learns to balance load (gamma_eff: 14 to 36-39, peaking in the step 30K-40K region), a stabilization phase where experts specialize under steady balance (B_0: 2.4 to 2.3, steps 100K-400K), and a relaxation phase where the router trades balance for quality as experts differentiate (gamma_eff: 27 to 9, steps 400K-1.2M). This non-monotone trajectory, invisible to post-hoc analysis of converged models, reveals that early MoE training prioritizes balance while late…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.