ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning

Lingxiao Tang; He Ye; Zhaoyang Chu; Muyang Ye; Zhongxin Liu; Xiaoxue Ren; Lingfeng Bao

arXiv:2603.11226·cs.SE·March 13, 2026

ExecVerify: White-Box RL with Verifiable Stepwise Rewards for Code Execution Reasoning

Lingxiao Tang, He Ye, Zhaoyang Chu, Muyang Ye, Zhongxin Liu, Xiaoxue Ren, Lingfeng Bao

PDF

Open Access

TL;DR

ExecVerify introduces a reinforcement learning approach with verifiable stepwise rewards for code execution reasoning, significantly improving smaller models' performance on reasoning and code generation benchmarks.

Contribution

The paper presents a novel white-box reward framework and a two-stage training pipeline that enhances code execution reasoning in smaller language models.

Findings

01

7B model with ExecVerify matches 32B model performance on reasoning benchmarks.

02

Up to 5.9% improvement in pass@1 on code generation tasks.

03

Effective reinforcement learning with verifiable rewards enhances reasoning accuracy.

Abstract

Code LLMs still struggle with code execution reasoning, especially in smaller models. Existing methods rely on supervised fine-tuning (SFT) with teacher-generated explanations, primarily in two forms: (1) input-output (I/O) prediction chains and (2) natural-language descriptions of execution traces. However, intermediate execution steps cannot be explicitly verified during SFT, so the training objective can reduce to merely matching teacher explanations. Moreover, training data is typically collected without explicit control over task difficulty. We introduce ExecVerify, which goes beyond text imitation by incorporating verifiable white-box rewards derived from execution traces, including next-statement prediction and variable value/type prediction. Our work first builds a dataset with multiple difficulty levels via constraint-based program synthesis. Then, we apply reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Topic Modeling