Heddle: A Distributed Orchestration System for Agentic RL Rollout
Zili Zhang, Yinmin Zhong, Chengxu Yang, Chao Jin, Bingyang Wu, Xinming Wei, Yuliang Liu, Xin Jin

TL;DR
Heddle is a system designed to optimize agentic RL rollouts by addressing long-tail trajectory bottlenecks through trajectory-centric scheduling, placement, and resource management, significantly improving throughput.
Contribution
Heddle introduces a novel trajectory-centric approach with three core mechanisms to enhance RL rollout efficiency, reducing delays and increasing throughput.
Findings
Achieves up to 2.5× higher rollout throughput.
Effectively neutralizes long-tail trajectory bottlenecks.
Demonstrates improvements across diverse RL workloads.
Abstract
Agentic Reinforcement Learning (RL) enables LLMs to solve complex tasks by alternating between a data-collection rollout phase and a policy training phase. During rollout, the agent generates trajectories, i.e., multi-step interactions between LLMs and external tools. Yet, frequent tool calls induce long-tailed trajectory generation that bottlenecks rollouts. This stems from step-centric designs that ignore trajectory context, triggering three system problems for long-tail trajectory generation: queueing delays, interference overhead, and inflated per-token time. We propose Heddle, a trajectory-centric system to optimize the when, where, and how of agentic rollout execution. Heddle integrates three core mechanisms: trajectory-level scheduling using runtime prediction and progressive priority to minimize cumulative queueing; trajectory-aware placement via presorted dynamic programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
