RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement   Learning

Jonas Gehring; Kunhao Zheng; Jade Copet; Vegard Mella; Quentin; Carbonneaux; Taco Cohen; Gabriel Synnaeve

arXiv:2410.02089·cs.CL·February 19, 2025·6 cites

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Jonas Gehring, Kunhao Zheng, Jade Copet, Vegard Mella, Quentin, Carbonneaux, Taco Cohen, Gabriel Synnaeve

PDF

Open Access 5 Models

TL;DR

This paper introduces RLEF, a reinforcement learning approach that enables code-generating language models to effectively utilize execution feedback, significantly improving their iterative coding capabilities and achieving state-of-the-art results in competitive programming.

Contribution

The paper presents a novel end-to-end reinforcement learning method that enhances code synthesis in LLMs by grounding their outputs in execution feedback, outperforming existing approaches.

Findings

01

Achieves new state-of-the-art results on competitive programming tasks.

02

Reduces the number of samples needed by an order of magnitude.

03

Models effectively leverage automatic feedback over multiple steps.

Abstract

Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to reliably achieve the desired outcomes. We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. We benchmark on competitive programming tasks, where we achieve new state-of-the art results with both small (8B parameters) and large (70B) models while reducing the amount of samples required by an order of magnitude. Our analysis of inference-time behavior demonstrates that our method produces LLMs that effectively leverage automatic feedback over multiple steps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing