StepCoder: Improve Code Generation with Reinforcement Learning from   Compiler Feedback

Shihan Dou; Yan Liu; Haoxiang Jia; Limao Xiong; Enyu Zhou; Wei Shen,; Junjie Shan; Caishuang Huang; Xiao Wang; Xiaoran Fan; Zhiheng Xi; Yuhao Zhou,; Tao Ji; Rui Zheng; Qi Zhang; Xuanjing Huang; Tao Gui

arXiv:2402.01391·cs.SE·February 6, 2024·2 cites

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen,, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou,, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, Tao Gui

PDF

Open Access 1 Repo

TL;DR

StepCoder introduces a reinforcement learning framework with curriculum-based exploration and fine-grained optimization for improved code generation, leveraging a new dataset to outperform existing methods.

Contribution

The paper presents a novel RL framework with curriculum learning and fine-grained optimization, along with a new dataset, to enhance code generation quality from LLMs.

Findings

01

Outperforms state-of-the-art in code generation benchmarks.

02

Improves exploration of output space for complex code.

03

Effective handling of long code sequences with curriculum learning.

Abstract

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ablustrund/apps_plus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Software Testing and Debugging Techniques · Fuzzy Logic and Control Systems