ROSE: Rollout On Serving GPUs via Cooperative Elasticity for Agentic RL
Wei Gao, Yuheng Zhao, Dilxat Muhtar, Dakai An, Xuchun Shang, Tianyuan Wu, Lunxi Cao, Shaopan Xiong, Weixun Wang, Ju Huang, Teng Ma, Siran Yang, Jiamang Wang, Lin Qu, Bo Zheng, and Wei Wang

TL;DR
ROSE introduces a cooperative elasticity system that dynamically shares GPU resources between serving and rollout workloads in agentic RL, significantly improving throughput and reducing rollout time while maintaining service quality.
Contribution
It presents a novel system architecture for elastic GPU sharing in agentic RL, combining co-serving execution, fast weight transfer, and dynamic scheduling.
Findings
End-to-end throughput improved by up to 3.3x.
Rollout time reduced by up to 1.5x.
No serving SLO violations observed.
Abstract
Agentic reinforcement learning (RL) is reshaping LLM post-training, but end-to-end training time is dominated by compute-intensive, multi-turn rollouts whose resource demand varies significantly across training steps. Resource-fixed systems cannot adapt to this variation, while resource-elastic approaches that provision external GPUs on demand suffer from high allocation overhead and limited availability. We observe that serving clusters leave substantial GPU compute and memory idle, and propose cooperative elasticity: sharing already-deployed serving GPUs with rollout workloads to provide on-demand elastic capacity. Realizing this is non-trivial, as it must preserve serving SLOs under bursty traffic while minimizing cross-cluster communication overhead. We present ROSE, a system that realizes cooperative elasticity for agentic RL post-training, comprising three components: (1) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
