Improving LLM-Generated Code Quality with GRPO

Maxime Robeyns; Laurence Aitchison

arXiv:2506.02211·cs.AI·June 4, 2025

Improving LLM-Generated Code Quality with GRPO

Maxime Robeyns, Laurence Aitchison

PDF

Open Access

TL;DR

This paper introduces GRPO, a method that improves the overall quality of LLM-generated code by incorporating a comprehensive code quality metric as a reward, beyond just functional correctness.

Contribution

The paper develops a new library to quantify multiple aspects of code quality and demonstrates its effectiveness in enhancing code produced by LLMs using GRPO.

Findings

01

GRPO increases code quality according to the new metric

02

Expert annotators confirm improved code quality

03

Code safety and maintainability are enhanced

Abstract

Large Language Models (LLMs) are gaining widespread use for code generation. Recent training procedures use execution feedback as a reward signal, typically focusing on the functional correctness of the code, using unit test pass rate as a reward signal. However, this reward signal fails to capture notions of maintainability, quality and safety of the code produced. We address this under-explored area and develop a comprehensive library to quantify various aspects of code quality, and use it as a reward in GRPO. We find GRPO increases code quality according to this measure, which is confirmed by expert, blinded human annotators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Digital Rights Management and Security · Mathematics, Computing, and Information Processing