Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards

Jeff Da; Clinton Wang; Xiang Deng; Yuntao Ma; Nikhil Barhate; Sean Hendryx

arXiv:2506.11425·cs.CL·June 24, 2025

Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards

Jeff Da, Clinton Wang, Xiang Deng, Yuntao Ma, Nikhil Barhate, Sean Hendryx

PDF

Open Access

TL;DR

Agent-RLVR enhances reinforcement learning from verifiable rewards by incorporating active guidance mechanisms, significantly improving software engineering task performance in complex environments where traditional RLVR struggles.

Contribution

Introduces Agent-RLVR, a novel framework that integrates agent guidance into RLVR, enabling effective training of agents in complex, multi-step environments like software engineering tasks.

Findings

01

Pass@1 performance of Qwen-2.5-72B-Instruct increased from 9.4% to 22.4%.

02

Guidance-augmented RLVR data improves test-time reward model training.

03

Framework demonstrates effectiveness in complex, real-world environments.

Abstract

Reinforcement Learning from Verifiable Rewards (RLVR) has been widely adopted as the de facto method for enhancing the reasoning capabilities of large language models and has demonstrated notable success in verifiable domains like math and competitive programming tasks. However, the efficacy of RLVR diminishes significantly when applied to agentic environments. These settings, characterized by multi-step, complex problem solving, lead to high failure rates even for frontier LLMs, as the reward landscape is too sparse for effective model training via conventional RLVR. In this work, we introduce Agent-RLVR, a framework that makes RLVR effective in challenging agentic settings, with an initial focus on software engineering tasks. Inspired by human pedagogy, Agent-RLVR introduces agent guidance, a mechanism that actively steers the agent towards successful trajectories by leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Business Process Modeling and Analysis

MethodsFocus