Does RL Expand the Capability Boundary of LLM Agents? A PASS@(k,T) Analysis
Zhiyuan Zhai, Wenjing Yan, Xiaodan Shao, Xin Wang

TL;DR
This paper introduces PASS@(k,T), a new metric to evaluate how reinforcement learning expands the problem-solving capabilities of large language model agents, especially in complex, compositional tasks.
Contribution
The study demonstrates that RL genuinely enlarges the capability boundary of LLM agents in agentic tool use, especially for tasks requiring sequential, compositional reasoning.
Findings
RL expands the capability boundary in agentic tool use tasks.
Pass-curve gaps widen at large k, indicating capability growth.
Supervised fine-tuning regresses the boundary, isolating exploration as a key factor.
Abstract
Does reinforcement learning genuinely expand what LLM agents can do, or merely make them more reliable? For static reasoning, recent work answers the second: base and RL pass@k curves converge at large k. We ask whether this holds for agentic tool use, where T rounds of interaction enable compositional strategies that re-sampling cannot recover. We introduce PASS@(k,T), a two-dimensional metric that jointly varies sampling budget k and interaction depth T, separating capability expansion from efficiency improvement. Our main finding is that, contrary to the static-reasoning result, tool-use RL genuinely enlarges the capability boundary: the RL agent's pass-curve pulls above the base model's and the gap widens at large k rather than converging. The expansion is specific to compositional, sequential information gathering; on simpler tasks RL behaves as prior work predicts. Under matched…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
