CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven, C.H. Hoi

TL;DR
CodeRL integrates pretrained language models with deep reinforcement learning, using a critic network and feedback from unit tests to significantly improve program synthesis performance, especially on complex unseen tasks.
Contribution
The paper introduces CodeRL, a novel framework combining pretrained models and reinforcement learning with a critic network for improved code generation.
Findings
Achieves new SOTA on APPS benchmark.
Demonstrates strong zero-shot transfer on MBPP.
Enhances code generation with critic feedback and critical sampling.
Abstract
Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software Reliability and Analysis Research
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Gated Linear Unit · Softmax · Multi-Head Attention · Residual Connection · SentencePiece · Attention Dropout · Dense Connections
