Self-Correcting Code Generation Using Small Language Models
Jeonghun Cho, Deokhyung Kang, Hyounghun Kim, Gary Geunbae Lee

TL;DR
This paper introduces CoCoS, a reinforcement learning approach that enables small language models to effectively self-correct code outputs through multi-turn refinement, significantly improving their performance.
Contribution
The paper presents CoCoS, a novel reinforcement learning method that enhances small language models' ability to self-correct code by training with an accumulated reward function.
Findings
CoCoS improves code generation accuracy by 35.8% on MBPP.
CoCoS enhances performance by 27.7% on HumanEval.
Small models struggle with self-reflection without specialized training.
Abstract
Self-correction has demonstrated potential in code generation by allowing language models to revise and improve their outputs through successive refinement. Recent studies have explored prompting-based strategies that incorporate verification or feedback loops using proprietary models, as well as training-based methods that leverage their strong reasoning capabilities. However, whether smaller models possess the capacity to effectively guide their outputs through self-reflection remains unexplored. Our findings reveal that smaller models struggle to exhibit reflective revision behavior across both self-correction paradigms. In response, we introduce CoCoS, an approach designed to enhance the ability of small language models for multi-turn code correction. Specifically, we propose an online reinforcement learning objective that trains the model to confidently maintain correct outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Natural Language Processing Techniques
