Heddle: A Distributed Orchestration System for Agentic RL Rollout

Zili Zhang; Yinmin Zhong; Chengxu Yang; Chao Jin; Bingyang Wu; Xinming Wei; Yuliang Liu; Xin Jin

arXiv:2603.28101·cs.LG·March 31, 2026

Heddle: A Distributed Orchestration System for Agentic RL Rollout

Zili Zhang, Yinmin Zhong, Chengxu Yang, Chao Jin, Bingyang Wu, Xinming Wei, Yuliang Liu, Xin Jin

PDF

TL;DR

Heddle is a system designed to optimize agentic RL rollouts by addressing long-tail trajectory bottlenecks through trajectory-centric scheduling, placement, and resource management, significantly improving throughput.

Contribution

Heddle introduces a novel trajectory-centric approach with three core mechanisms to enhance RL rollout efficiency, reducing delays and increasing throughput.

Findings

01

Achieves up to 2.5× higher rollout throughput.

02

Effectively neutralizes long-tail trajectory bottlenecks.

03

Demonstrates improvements across diverse RL workloads.

Abstract

Agentic Reinforcement Learning (RL) enables LLMs to solve complex tasks by alternating between a data-collection rollout phase and a policy training phase. During rollout, the agent generates trajectories, i.e., multi-step interactions between LLMs and external tools. Yet, frequent tool calls induce long-tailed trajectory generation that bottlenecks rollouts. This stems from step-centric designs that ignore trajectory context, triggering three system problems for long-tail trajectory generation: queueing delays, interference overhead, and inflated per-token time. We propose Heddle, a trajectory-centric system to optimize the when, where, and how of agentic rollout execution. Heddle integrates three core mechanisms: trajectory-level scheduling using runtime prediction and progressive priority to minimize cumulative queueing; trajectory-aware placement via presorted dynamic programming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.