ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

Chao Jin; Xinming Wei; Yinmin Zhong; Chengxu Yang; Bingyang Wu; Ruidong Zhu; Zili Zhang; Yuliang Liu; Xin Jin

arXiv:2605.08639·cs.LG·May 12, 2026

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

Chao Jin, Xinming Wei, Yinmin Zhong, Chengxu Yang, Bingyang Wu, Ruidong Zhu, Zili Zhang, Yuliang Liu, Xin Jin

PDF

TL;DR

ReLibra introduces a load balancing system for MoE training in reinforcement learning that leverages routing replay to dynamically balance expert loads at micro-batch granularity, improving throughput.

Contribution

It exploits routing replay in RL workflows to enable fine-grained load balancing at micro-batch level, addressing load imbalance issues in MoE training.

Findings

01

ReLibra improves training throughput by up to 1.6× over Megatron-LM.

02

ReLibra outperforms EPLB by up to 1.2× even with oracle loads.

03

ReLibra maintains 6%-10% of the throughput of an ideal balanced baseline.

Abstract

Load imbalance is a long-standing challenge in Mixture-of-Experts (MoE) training and is exacerbated in reinforcement learning (RL) for LLMs, where hot experts can shift frequently across micro-batches. Existing MoE training systems rely on historical loads to predict future expert demand, making them less effective under sharp fluctuations. We propose ReLibra, an MoE RL training system that exploits a unique opportunity in RL's rollout-training workflow, routing replay, to enable fine-grained load balancing at micro-batch granularity. Because rollout and training process the same tokens with the same MoE parameters, the token-to-expert routing decisions are known before training starts. Leveraging this information, ReLibra places two MoE load-balancing mechanisms at inter- and intra-batch timescales, matching their communication patterns to hierarchical network bandwidths. At the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.