Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair
David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez

TL;DR
This paper introduces a self-bootstrapping approach using LLMs to generate and evaluate synthetic training data, significantly improving automated program repair across multiple languages and bug types.
Contribution
It presents a novel two-phase methodology for synthetic data generation and quality assessment, enhancing APR performance with less computational effort.
Findings
Synthetic dataset improved Top@1 prediction accuracy by 47%
Achieved statistically significant improvements over baseline systems
Validated approach across 12 programming languages and 13 bug categories
Abstract
This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs). Current APR systems are constrained by the limited availability of high-quality training data encompassing diverse bug types across multiple programming languages. The proposed approach addresses this limitation through a two-phase process: a synthetic sample generation followed by a rigorous quality assessment. Multiple state-of-the-art LLMs were employed to generate approximately 30,000 paired examples of buggy and fixed code across 12 programming languages and 13 bug categories. Subsequently, these samples underwent cross-model evaluation against five criteria: correctness, code quality, security, performance, and completeness. Experimental evaluation on the VulRepair test set dataset showed statistically significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
