RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training

Tianyuan Wu; Lunxi Cao; Yining Wei; Wei Gao; Yuheng Zhao; Dakai An; Shaopan Xiong; Zhiqiang Lv; Ju Huang; Siran Yang; Yinghao Yu; Jiamang Wang; Lin Qu; Wei Wang

arXiv:2512.11306·cs.DC·December 16, 2025

RollMux: Phase-Level Multiplexing for Disaggregated RL Post-Training

Tianyuan Wu, Lunxi Cao, Yining Wei, Wei Gao, Yuheng Zhao, Dakai An, Shaopan Xiong, Zhiqiang Lv, Ju Huang, Siran Yang, Yinghao Yu, Jiamang Wang, Lin Qu, Wei Wang

PDF

Open Access

TL;DR

RollMux is a cluster scheduling framework that improves hardware efficiency in RL post-training by intelligently orchestrating disaggregated phases, reducing idleness, and maintaining high performance on large GPU clusters.

Contribution

It introduces the co-execution group abstraction and a two-tier scheduler to optimize phase-level multiplexing in disaggregated RL training.

Findings

01

RollMux achieves 1.84x cost efficiency improvement over standard disaggregation.

02

It maintains 100% SLO attainment on a large GPU cluster.

03

Provides a provably optimal round-robin intra-group scheduling mechanism.

Abstract

Rollout-training disaggregation is emerging as the standard architecture for Reinforcement Learning (RL) post-training, where memory-bound rollout and compute-bound training are physically disaggregated onto purpose-built clusters to maximize hardware efficiency. However, the strict synchronization required by on-policy algorithms introduces severe dependency bubbles, forcing one cluster to idle while the dependent phase is running on the other. We present RollMux, a cluster scheduling framework that reclaims these bubbles through cross-cluster orchestration. RollMux is built on the insight that the structural idleness of one job can be effectively utilized by the active phase of another. To realize this, we introduce the co-execution group abstraction, which partitions the cluster into isolated locality domains. This abstraction enables a two-tier scheduling architecture: an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Reinforcement Learning in Robotics · Software-Defined Networks and 5G