Multi-Turn Code Generation Through Single-Step Rewards

Arnav Kumar Jain; Gonzalo Gonzalez-Pumariega; Wayne Chen; Alexander M Rush; Wenting Zhao; Sanjiban Choudhury

arXiv:2502.20380·cs.LG·June 30, 2025

Multi-Turn Code Generation Through Single-Step Rewards

Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury

PDF

Open Access 1 Repo

TL;DR

This paper introduces $ode$, a scalable method for multi-turn code generation that uses only single-step rewards, simplifying the process and improving performance over existing complex hierarchical reinforcement learning approaches.

Contribution

The paper presents $ode$, a novel approach that treats multi-turn code generation as a one-step recoverable MDP, enabling effective training with only single-step rewards.

Findings

01

$ode$ outperforms state-of-the-art baselines in experiments.

02

The approach effectively utilizes execution feedback for code generation.

03

Analysis confirms the simplicity and scalability of the method.

Abstract

We address the problem of code generation from multi-turn execution feedback. Existing methods either generate code without feedback or use complex, hierarchical reinforcement learning to optimize multi-turn rewards. We propose a simple yet scalable approach, $μ$ Code, that solves multi-turn code generation using only single-step rewards. Our key insight is that code generation is a one-step recoverable MDP, where the correct code can be recovered from any intermediate code state in a single turn. $μ$ Code iteratively trains both a generator to provide code solutions conditioned on multi-turn execution feedback and a verifier to score the newly generated code. Experimental evaluations show that our approach achieves significant improvements over the state-of-the-art baselines. We provide analysis of the design choices of the reward models and policy, and show the efficacy of $μ$ Code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

portal-cornell/mucode
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Machine Learning and Algorithms