AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Haizhong Zheng; Yizhuo Di; Jiahui Wang; Shuowei Jin; Xueshen Liu; Yongji Wu; Z. Morley Mao; Ion Stoica; Jiawei Zhao; Beidi Chen

arXiv:2605.15565·cs.LG·May 18, 2026

AstraFlow: Dataflow-Oriented Reinforcement Learning for Agentic LLMs

Haizhong Zheng, Yizhuo Di, Jiahui Wang, Shuowei Jin, Xueshen Liu, Yongji Wu, Z. Morley Mao, Ion Stoica, Jiawei Zhao, Beidi Chen

PDF

1 Repo

TL;DR

AstraFlow introduces a dataflow-oriented reinforcement learning system for large language models, enabling efficient multi-policy training and elastic resource utilization with minimal system engineering effort.

Contribution

It replaces trainer-centered control with principled component abstractions, supporting complex workloads and diverse compute resources without system modifications.

Findings

01

Supports multi-policy training and elastic scaling effectively.

02

Achieves 2.7x faster training times compared to existing RL systems.

03

Maintains comparable or better accuracy across various workloads.

Abstract

Reinforcement learning (RL) is increasingly used to improve the reasoning, coding, and tool-use capabilities of large language models, but agentic RL remains prohibitively expensive. Scaling RL to agentic LLMs requires supporting complex workloads, including multi-policy collaborative training, while efficiently using elastic, heterogeneous, and cross-region compute resources. Existing LLM RL systems support some of these capabilities, but each new extension often requires dedicated system engineering. This burden arises from trainer-centered control architectures and the lack of principled abstractions for RL system components. To address these limitations, we propose AstraFlow, a dataflow-oriented RL system that replaces conventional trainer-centered control with principled component abstractions. In AstraFlow, rollout services, dataflow management, and training are decoupled into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

infini-ai-lab/astraflow
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.