SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Shiyi Cao; Dacheng Li; Fangzhou Zhao; Shuo Yuan; Sumanth R. Hegde; Connor Chen; Charlie Ruan; Tyler Griggs; Shu Liu; Eric Tang; Richard Liaw; Philipp Moritz; Matei Zaharia; Joseph E. Gonzalez; Ion Stoica

arXiv:2511.16108·cs.AI·November 21, 2025

SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Shiyi Cao, Dacheng Li, Fangzhou Zhao, Shuo Yuan, Sumanth R. Hegde, Connor Chen, Charlie Ruan, Tyler Griggs, Shu Liu, Eric Tang, Richard Liaw, Philipp Moritz, Matei Zaharia, Joseph E. Gonzalez, Ion Stoica

PDF

Open Access

TL;DR

SkyRL-Agent is a versatile framework that enhances multi-turn RL training efficiency for LLM agents, enabling faster training, better code navigation, and broad task generalization.

Contribution

It introduces an optimized asynchronous pipeline and tool-enhanced training recipe, significantly improving RL training speed and effectiveness for multi-task LLM agents.

Findings

01

Achieved 1.55x speedup with the pipeline dispatcher.

02

SA-SWE-32B reaches 39.4% Pass@1 on SWE-Bench.

03

Model generalizes to various agentic tasks.

Abstract

We introduce SkyRL-Agent, a framework for efficient, multi-turn, long-horizon agent training and evaluation. It provides efficient asynchronous dispatching, lightweight tool integration, and flexible backend interoperability, enabling seamless use with existing RL frameworks such as SkyRL-train, VeRL, and Tinker. Using SkyRL-Agent, we train SA-SWE-32B, a software engineering agent trained from Qwen3-32B (24.4% Pass@1) purely with reinforcement learning. We introduce two key components: an optimized asynchronous pipeline dispatcher that achieves a 1.55x speedup over naive asynchronous batching, and a tool-enhanced training recipe leveraging an AST-based search tool to facilitate code navigation, boost rollout Pass@K, and improve training efficiency. Together, these optimizations enable SA-SWE-32B to reach 39.4% Pass@1 on SWE-Bench Verified with more than 2x cost reduction compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Parallel Computing and Optimization Techniques · Big Data and Digital Economy