BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

Yuanhao Li; Hongbo Wang; Xiaotang Shang; Xunzhu Tang; Yiming Cao; Xuhong Chen

arXiv:2605.09134·cs.AI·May 14, 2026

BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models

Yuanhao Li, Hongbo Wang, Xiaotang Shang, Xunzhu Tang, Yiming Cao, Xuhong Chen

PDF

TL;DR

BoostAPR introduces a reinforcement learning framework with dual reward models and reward redistribution to improve automated program repair across multiple benchmarks.

Contribution

It presents a novel three-stage training approach combining supervised fine-tuning, dual reward models, and PPO optimization for better bug fixing.

Findings

01

BoostAPR achieves 40.7% on SWE-bench Verified, outperforming the base model.

02

It attains 24.8% on Defects4J with cross-language transfer.

03

Achieves 84.5% on HumanEval-Java and 95.0% on QuixBugs, showing strong generalization.

Abstract

Reinforcement learning for program repair is hindered by sparse execution feedback and coarse sequence-level rewards that obscure which edits actually fix bugs. We present BoostAPR, a three-stage framework addressing these challenges: (1) supervised fine-tuning on execution-verified demonstrations with reasoning traces, (2) training dual reward models--a sequence-level assessor and a line-level credit allocator--from execution outcomes, and (3) PPO optimization where the line-level model redistributes rewards to critical edit regions. This line-level credit assignment operates at an intermediate granularity naturally suited to code changes. Trained on SWE-Gym and evaluated on four benchmarks, BoostAPR achieves 40.7% on SWE-bench Verified (+22.9pp over base model), 24.8% on Defects4J (Python-to-Java transfer), 84.5% on HumanEval-Java, and 95.0% on QuixBugs, achieving competitive results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.