Rethinking Repetition Problems of LLMs in Code Generation
Yihong Dong, Yuchen Liu, Xue Jiang, Zhi Jin, Ge Li

TL;DR
This paper addresses structural repetition in code generated by LLMs by introducing RPG, a grammar-based decoding method that reduces repetitions and improves code quality, supported by a new evaluation dataset and extensive experiments.
Contribution
The paper formally defines structural repetition in code generation and proposes RPG, a novel grammar-based decoding approach to mitigate this issue in LLMs.
Findings
RPG significantly reduces structural repetitions in generated code.
RPG outperforms existing baselines on multiple benchmarks.
The new dataset CodeRepetEval enables comprehensive evaluation of repetition mitigation methods.
Abstract
With the advent of neural language models, the performance of code generation has been significantly boosted. However, the problem of repetitions during the generation process continues to linger. Previous work has primarily focused on content repetition, which is merely a fraction of the broader repetition problem in code generation. A more prevalent and challenging problem is structural repetition. In structural repetition, the repeated code appears in various patterns but possesses a fixed structure, which can be inherently reflected in grammar. In this paper, we formally define structural repetition and propose an efficient decoding approach called RPG, which stands for Repetition Penalization based on Grammar, to alleviate the repetition problems in code generation for LLMs. Specifically, RPG first leverages grammar rules to identify repetition problems during code generation, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Mathematics, Computing, and Information Processing
