Execution-Grounded Credit Assignment for GRPO in Code Generation
Abhijit Kumar, Natalya Kumar, Shikhar Gupta

TL;DR
This paper introduces Execution-Grounded Credit Assignment (EGCA), a method that localizes credit assignment in code generation by analyzing execution traces, significantly improving pass rates without additional training components.
Contribution
EGCA provides a critic-free, execution-based credit assignment method that enhances program synthesis performance by pinpointing semantic errors using execution traces.
Findings
Achieves 82.1% pass@1 on HumanEval (+3.1)
Achieves 68.9% pass@1 on MBPP (+1.5)
Adds 18% wall-clock overhead
Abstract
Critic-free reinforcement learning with verifiable rewards (RLVR) improves code generation by optimizing unit-test pass rates, but GRPO-style updates suffer from coarse credit assignment: a single outcome signal is spread uniformly across long programs even when failure stems from a localized semantic error. We propose Execution-Grounded Credit Assignment (EGCA), which localizes GRPO updates using execution traces. For programs that satisfy algorithmic constraints but fail tests, EGCA executes the candidate and a canonical reference solution (curated once offline; used for analysis, not supervision) under identical instrumentation, identifies the earliest semantic divergence, and assigns advantage only to the corresponding token span while masking downstream tokens. EGCA is a drop-in modification requiring no critic, auxiliary loss, or learned verifier, yielding 82.1% pass@1 on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software Engineering Research · Reinforcement Learning in Robotics
