RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training
Tianyuan Wu, Lunxi Cao, Yining Wei, Wei Gao, Yuheng Zhao, Dakai An, Shaopan Xiong, Zhiqiang Lv, Ju Huang, Siran Yang, Yinghao Yu, Jiamang Wang, Lin Qu, Wei Wang

TL;DR
RollMux is a cluster scheduling framework that improves hardware efficiency in RL post-training by intelligently orchestrating disaggregated phases, reducing idleness, and maintaining high performance on large GPU clusters.
Contribution
It introduces the co-execution group abstraction and a two-tier scheduler to optimize phase-level multiplexing in disaggregated RL training.
Findings
RollMux achieves 1.84x cost efficiency improvement over standard disaggregation.
It maintains 100% SLO attainment on a large GPU cluster.
Provides a provably optimal round-robin intra-group scheduling mechanism.
Abstract
Rollout-training disaggregation is emerging as the standard architecture for Reinforcement Learning (RL) post-training, where memory-bound rollout and compute-bound training are physically disaggregated onto purpose-built clusters to maximize hardware efficiency. However, the strict synchronization required by on-policy algorithms introduces severe dependency bubbles, forcing one cluster to idle while the dependent phase is running on the other. We present RollMux, a cluster scheduling framework that reclaims these bubbles through cross-cluster orchestration. RollMux is built on the insight that the structural idleness of one job can be effectively utilized by the active phase of another. To realize this, we introduce the co-execution group abstraction, which partitions the cluster into isolated locality domains. This abstraction enables a two-tier scheduling architecture: an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Reinforcement Learning in Robotics · Software-Defined Networks and 5G
