Improving LLM-Generated Code Quality with GRPO
Maxime Robeyns, Laurence Aitchison

TL;DR
This paper introduces GRPO, a method that improves the overall quality of LLM-generated code by incorporating a comprehensive code quality metric as a reward, beyond just functional correctness.
Contribution
The paper develops a new library to quantify multiple aspects of code quality and demonstrates its effectiveness in enhancing code produced by LLMs using GRPO.
Findings
GRPO increases code quality according to the new metric
Expert annotators confirm improved code quality
Code safety and maintainability are enhanced
Abstract
Large Language Models (LLMs) are gaining widespread use for code generation. Recent training procedures use execution feedback as a reward signal, typically focusing on the functional correctness of the code, using unit test pass rate as a reward signal. However, this reward signal fails to capture notions of maintainability, quality and safety of the code produced. We address this under-explored area and develop a comprehensive library to quantify various aspects of code quality, and use it as a reward in GRPO. We find GRPO increases code quality according to this measure, which is confirmed by expert, blinded human annotators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Mathematics, Computing, and Information Processing
