SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling

Jinghui Wang; Shaojie Wang; Yinghan Cui; Xuxing Chen; Chao Wang; Xiaojiang Zhang; Minglei Zhang; Jiarong Zhang; Wenhao Zhuang; Yuchen Cao; Wankang Bao; Haimo Li; Zheng Lin; Huiming Wang; Haoyang Huang; Zongxian Feng; Zizheng Zhan; Ken Deng; Wen Xiang; Huaixi Tang; Kun Wu; Mengtong Li; Mengfei Xie; Junyi Peng; Haotian Zhang; Bin Chen; Bing Yu

arXiv:2508.11553·cs.LG·August 18, 2025

SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling

Jinghui Wang, Shaojie Wang, Yinghan Cui, Xuxing Chen, Chao Wang, Xiaojiang Zhang, Minglei Zhang, Jiarong Zhang, Wenhao Zhuang, Yuchen Cao, Wankang Bao, Haimo Li, Zheng Lin, Huiming Wang, Haoyang Huang, Zongxian Feng, Zizheng Zhan, Ken Deng, Wen Xiang, Huaixi Tang, Kun Wu

PDF

TL;DR

SeamlessFlow is a reinforcement learning framework that decouples training from execution, maximizes GPU utilization, and employs tag-based scheduling to eliminate pipeline bubbles in large-scale RL deployments.

Contribution

It introduces a data plane for decoupling training from agents and a tag-driven scheduling paradigm for resource optimization in RL pipelines.

Findings

01

Achieves high throughput with minimal idle time.

02

Eliminates pipeline bubbles in complex RL tasks.

03

Supports scalable, stable multi-agent RL training.

Abstract

We introduce SeamlessFlow, a server based reinforcement learning (RL) framework that addresses two core challenges in industrial scale RL: (1) decoupling RL training from the complex execution flow of agents; (2) maximizing GPU utilization with minimal idle time while preserving the stability and scalability required for large-scale deployments. First, SeamlessFlow introduces a data plane that decouples the RL trainer from diverse, complex agent implementations while sustaining high throughput. A central trajectory manager maintains complete interaction histories and supports partial rollout, allowing rollout to pause for weight updates and resume seamlessly, keeping agents unaware of service interruptions. Second, we propose a tag driven scheduling paradigm that abstracts hardware into capability tagged resources, unifying colocated and disaggregated architectures. Based on this,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.