Exploring Generalizable Automated Program Repair with Large Language Models

Viola Campos; Ridwan Shariffdeen; Adrian Ulges; Yannic Noller

arXiv:2506.03283·cs.SE·February 23, 2026

Exploring Generalizable Automated Program Repair with Large Language Models

Viola Campos, Ridwan Shariffdeen, Adrian Ulges, Yannic Noller

PDF

Open Access

TL;DR

This paper conducts an extensive empirical evaluation of large language models for automated program repair across multiple programming languages, highlighting their strengths, limitations, and the impact of fault localization on repair accuracy.

Contribution

It provides the first comprehensive analysis of diverse LLMs for APR across different languages and fault localization scenarios, offering insights for developing more generalizable repair techniques.

Findings

01

Different models excel in different languages.

02

Combining models improves bug fixing coverage.

03

Imperfect fault localization significantly reduces repair accuracy.

Abstract

Automated Program Repair (APR) proposes bug fixes to aid developers in maintaining software. The state of the art in this domain focuses on LLMs, leveraging their strong capabilities to comprehend specifications in natural language and to generate program code. However, despite the APR community's research achievements and industry deployments, APR still cannot generalize broadly. In this work, we present an intensive empirical evaluation of LLMs' capabilities in APR. We evaluate a diverse set of 13 recent open and closed models. In particular, we explore language-agnostic repair by utilizing benchmarks for Java, JavaScript, Python, and PHP. Besides the generalization across languages and levels of patch complexity, we also investigate the effects of fault localization (FL). Our key results include: (1) Different LLMs tend to perform best for different languages, which makes it hard to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Testing and Debugging Techniques · Software System Performance and Reliability · Software Engineering Research