Rethinking Repetition Problems of LLMs in Code Generation

Yihong Dong; Yuchen Liu; Xue Jiang; Zhi Jin; Ge Li

arXiv:2505.10402·cs.CL·May 16, 2025

Rethinking Repetition Problems of LLMs in Code Generation

Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper addresses structural repetition in code generated by LLMs by introducing RPG, a grammar-based decoding method that reduces repetitions and improves code quality, supported by a new evaluation dataset and extensive experiments.

Contribution

The paper formally defines structural repetition in code generation and proposes RPG, a novel grammar-based decoding approach to mitigate this issue in LLMs.

Findings

01

RPG significantly reduces structural repetitions in generated code.

02

RPG outperforms existing baselines on multiple benchmarks.

03

The new dataset CodeRepetEval enables comprehensive evaluation of repetition mitigation methods.

Abstract

With the advent of neural language models, the performance of code generation has been significantly boosted. However, the problem of repetitions during the generation process continues to linger. Previous work has primarily focused on content repetition, which is merely a fraction of the broader repetition problem in code generation. A more prevalent and challenging problem is structural repetition. In structural repetition, the repeated code appears in various patterns but possesses a fixed structure, which can be inherently reflected in grammar. In this paper, we formally define structural repetition and propose an efficient decoding approach called RPG, which stands for Repetition Penalization based on Grammar, to alleviate the repetition problems in code generation for LLMs. Specifically, RPG first leverages grammar rules to identify repetition problems during code generation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lyc127/rpg
pytorchOfficial

Videos

Rethinking Repetition Problems of LLMs in Code Generation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing