Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards
Jeff Da, Clinton Wang, Xiang Deng, Yuntao Ma, Nikhil Barhate, Sean Hendryx

TL;DR
Agent-RLVR enhances reinforcement learning from verifiable rewards by incorporating active guidance mechanisms, significantly improving software engineering task performance in complex environments where traditional RLVR struggles.
Contribution
Introduces Agent-RLVR, a novel framework that integrates agent guidance into RLVR, enabling effective training of agents in complex, multi-step environments like software engineering tasks.
Findings
Pass@1 performance of Qwen-2.5-72B-Instruct increased from 9.4% to 22.4%.
Guidance-augmented RLVR data improves test-time reward model training.
Framework demonstrates effectiveness in complex, real-world environments.
Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) has been widely adopted as the de facto method for enhancing the reasoning capabilities of large language models and has demonstrated notable success in verifiable domains like math and competitive programming tasks. However, the efficacy of RLVR diminishes significantly when applied to agentic environments. These settings, characterized by multi-step, complex problem solving, lead to high failure rates even for frontier LLMs, as the reward landscape is too sparse for effective model training via conventional RLVR. In this work, we introduce Agent-RLVR, a framework that makes RLVR effective in challenging agentic settings, with an initial focus on software engineering tasks. Inspired by human pedagogy, Agent-RLVR introduces agent guidance, a mechanism that actively steers the agent towards successful trajectories by leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Business Process Modeling and Analysis
MethodsFocus
