Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du; Luu Anh Tuan; Yue Liu; Yuhao Qing; Dong Huang; Xinyi He; Qian Liu; Zejun Ma; See-kiong Ng

arXiv:2505.23387·cs.SE·June 4, 2025

Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Mingzhe Du, Luu Anh Tuan, Yue Liu, Yuhao Qing, Dong Huang, Xinyi He, Qian Liu, Zejun Ma, See-kiong Ng

PDF

Open Access 1 Datasets

TL;DR

This paper presents a reinforcement learning-based framework that enables large language models to iteratively improve code efficiency at test time using execution feedback, surpassing human benchmarks.

Contribution

It introduces a novel RL-based test-time optimization method for LLMs to self-improve code efficiency through iterative refinement with empirical feedback.

Findings

01

GRPO with RL significantly boosts code efficiency metrics.

02

SFT and DPO quickly saturate in efficiency gains.

03

The approach outperforms human submissions in code efficiency.

Abstract

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore three training strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization (GRPO). Experiments on our Venus dataset and the APPS benchmark show that SFT and DPO rapidly saturate in efficiency gains. In contrast, GRPO, using reinforcement learning (RL) with execution feedback, continuously optimizes code performance, significantly boosting both pass@1 (from 47% to 62%) and the likelihood of outperforming human submissions in efficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Elfsong/Venus
dataset· 142 dl
142 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElevator Systems and Control

MethodsShrink and Fine-Tune · Direct Preference Optimization