Exploring Generalizable Automated Program Repair with Large Language Models
Viola Campos, Ridwan Shariffdeen, Adrian Ulges, Yannic Noller

TL;DR
This paper conducts an extensive empirical evaluation of large language models for automated program repair across multiple programming languages, highlighting their strengths, limitations, and the impact of fault localization on repair accuracy.
Contribution
It provides the first comprehensive analysis of diverse LLMs for APR across different languages and fault localization scenarios, offering insights for developing more generalizable repair techniques.
Findings
Different models excel in different languages.
Combining models improves bug fixing coverage.
Imperfect fault localization significantly reduces repair accuracy.
Abstract
Automated Program Repair (APR) proposes bug fixes to aid developers in maintaining software. The state of the art in this domain focuses on LLMs, leveraging their strong capabilities to comprehend specifications in natural language and to generate program code. However, despite the APR community's research achievements and industry deployments, APR still cannot generalize broadly. In this work, we present an intensive empirical evaluation of LLMs' capabilities in APR. We evaluate a diverse set of 13 recent open and closed models. In particular, we explore language-agnostic repair by utilizing benchmarks for Java, JavaScript, Python, and PHP. Besides the generalization across languages and levels of patch complexity, we also investigate the effects of fault localization (FL). Our key results include: (1) Different LLMs tend to perform best for different languages, which makes it hard to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research
