Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair

Jia Li; Yuxin Su; Ting Peng; Hailiang Huang; Yuetang Deng; and Michael R. Lyu

arXiv:2605.07276·cs.AI·May 11, 2026

Signal Reshaping for GRPO in Weak-Feedback Agentic Code Repair

Jia Li, Yuxin Su, Ting Peng, Hailiang Huang, Yuetang Deng, and Michael R. Lyu

PDF

TL;DR

This paper introduces signal reshaping techniques for GRPO in weak-feedback code repair, significantly improving semantic accuracy and trajectory control in agentic RL settings.

Contribution

It proposes a minimal signal-reshaping construction that enhances GRPO's effectiveness by recovering semantic ranking and localizing credit, leading to substantial performance gains.

Findings

01

Full signal-reshaped GRPO improves accuracy from 0.385 to 0.535.

02

Layered rewards and process-score weighting further enhance accuracy and efficiency.

03

Compared to token-level distillation, the proposed method better captures outcome semantics and control.

Abstract

Code-agent RL often receives weak feedback: rollout-time signals are reliable and executable, but capture only necessary or surface conditions for task success rather than the target semantic predicate. Using agentic compile-fix as the setting, we study signal reshaping for standard GRPO under such feedback. Our central claim is that GRPO's within-group comparison is meaningful only after three kinds of signals are reshaped: outcome rewards recover semantic ranking, process signals localize intra-trajectory credit, and rollouts from the same prompt remain execution-comparable. We operationalize these conditions with a minimal signal-reshaping construction that leaves GRPO's group-normalized advantage construction unchanged: compile-and-semantic layered rewards reshape trajectory ranking, step-level process scores outside group reward normalization reshape within-trajectory update…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.