RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

Wei Gao; Yuheng Zhao; Tianyuan Wu; Shaopan Xiong; Weixun Wang; Dakai An; Lunxi Cao; Dilxat Muhtar; Zichen Liu; Haizhou Zhao; Ju Huang; Siran Yang; Yongbin Li; Wenbo Su; Jiamang Wang; Lin Qu; Bo Zheng; Wei Wang

arXiv:2512.22560·cs.DC·December 30, 2025

RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure

Wei Gao, Yuheng Zhao, Tianyuan Wu, Shaopan Xiong, Weixun Wang, Dakai An, Lunxi Cao, Dilxat Muhtar, Zichen Liu, Haizhou Zhao, Ju Huang, Siran Yang, Yongbin Li, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng, Wei Wang

PDF

Open Access

TL;DR

This paper introduces RollArc, a distributed system that enhances agentic RL training efficiency on disaggregated infrastructure by optimizing hardware utilization and reducing training time.

Contribution

RollArc is a novel system that leverages hardware affinity, fine-grained asynchrony, and statefulness-aware computation to improve agentic RL training throughput on disaggregated hardware.

Findings

01

Achieves 1.35-2.05× reduction in training time.

02

Effectively scales to hundreds-of-billions-parameter models.

03

Demonstrates robustness on large GPU clusters.

Abstract

Agentic Reinforcement Learning (RL) enables Large Language Models (LLMs) to perform autonomous decision-making and long-term planning. Unlike standard LLM post-training, agentic RL workloads are highly heterogeneous, combining compute-intensive prefill phases, bandwidth-bound decoding, and stateful, CPU-heavy environment simulations. We argue that efficient agentic RL training requires disaggregated infrastructure to leverage specialized, best-fit hardware. However, naive disaggregation introduces substantial synchronization overhead and resource underutilization due to the complex dependencies between stages. We present RollArc, a distributed system designed to maximize throughput for multi-task agentic RL on disaggregated infrastructure. RollArc is built on three core principles: (1) hardware-affinity workload mapping, which routes compute-bound and bandwidth-bound tasks to bestfit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Advanced Neural Network Applications