ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning
Chao Jin, Xinming Wei, Yinmin Zhong, Chengxu Yang, Bingyang Wu, Ruidong Zhu, Zili Zhang, Yuliang Liu, Xin Jin

TL;DR
ReLibra introduces a load balancing system for MoE training in reinforcement learning that leverages routing replay to dynamically balance expert loads at micro-batch granularity, improving throughput.
Contribution
It exploits routing replay in RL workflows to enable fine-grained load balancing at micro-batch level, addressing load imbalance issues in MoE training.
Findings
ReLibra improves training throughput by up to 1.6× over Megatron-LM.
ReLibra outperforms EPLB by up to 1.2× even with oracle loads.
ReLibra maintains 6%-10% of the throughput of an ideal balanced baseline.
Abstract
Load imbalance is a long-standing challenge in Mixture-of-Experts (MoE) training and is exacerbated in reinforcement learning (RL) for LLMs, where hot experts can shift frequently across micro-batches. Existing MoE training systems rely on historical loads to predict future expert demand, making them less effective under sharp fluctuations. We propose ReLibra, an MoE RL training system that exploits a unique opportunity in RL's rollout-training workflow, routing replay, to enable fine-grained load balancing at micro-batch granularity. Because rollout and training process the same tokens with the same MoE parameters, the token-to-expert routing decisions are known before training starts. Leveraging this information, ReLibra places two MoE load-balancing mechanisms at inter- and intra-batch timescales, matching their communication patterns to hierarchical network bandwidths. At the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
