Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

Jiawei Huang; Qingping Yang; Renjie Zheng; Jiaze Chen

arXiv:2604.16335·cs.LG·April 21, 2026

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

Jiawei Huang, Qingping Yang, Renjie Zheng, Jiaze Chen

PDF

TL;DR

This paper introduces a rubric-based Generative Reward Model that offers richer feedback than binary success signals, enhancing reinforcement learning for software engineering tasks with improved behavior shaping and accuracy.

Contribution

It proposes a novel rubric-based GRM that guides reinforcement fine-tuning of LLMs in SWE tasks, surpassing traditional terminal reward methods.

Findings

01

The rubric-based GRM better suppresses undesirable behaviors.

02

It promotes beneficial behaviors during training.

03

Final test accuracy improves with the proposed method.

Abstract

Despite recent progress in Large Language Model (LLM) Agents for Software Engineering (SWE) tasks, end-to-end fine-tuning typically relies on verifiable terminal rewards such as whether all unit tests pass. While these binary signals reflect whether the final solution is correct, they provide little guidance for shaping intermediate behaviors during multi-step interactions, thereby limiting improvements in the overall quality of the resolution process. To address this, we introduce a rubric-based Generative Reward Model (GRM) that provides richer learning signals. The GRM is equipped with human-designed rubrics that indicate criteria for encouraging or discouraging specific behavioral patterns, and we leverage this feedback for high-quality training data collection via trajectory filtration. When used for Reinforced Fine-Tuning (RFT) on SWE Tasks, our approach outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.