Practical Program Repair in the Era of Large Pre-trained Language Models

Chunqiu Steven Xia; Yuxiang Wei; Lingming Zhang

arXiv:2210.14179·cs.SE·December 11, 2024·29 cites

Practical Program Repair in the Era of Large Pre-trained Language Models

Chunqiu Steven Xia, Yuxiang Wei, Lingming Zhang

PDF

Open Access

TL;DR

This paper conducts a comprehensive study on using large pre-trained language models for automated program repair, demonstrating their superior performance over traditional methods across multiple datasets and languages.

Contribution

It is the first extensive evaluation of state-of-the-art large PLMs for APR, exploring different repair settings and highlighting the importance of suffix code in infilling models.

Findings

01

Larger PLMs tend to perform better in bug fixing.

02

Infilling models with suffix code generate more and higher-quality patches.

03

PLMs outperform existing APR techniques on multiple datasets.

Abstract

Automated Program Repair (APR) aims to help developers automatically patch software bugs. However, current state-of-the-art traditional and learning-based APR techniques face the problem of limited patch variety, failing to fix complicated bugs. This is mainly due to the reliance on bug-fixing datasets to craft fix templates or directly predict potential patches. Large Pre-Trained Language Models (PLMs), trained using billions of text/code tokens, can potentially help avoid this issue. Very recently, researchers have directly leveraged PLMs for APR without relying on any bug-fixing datasets. Meanwhile, such existing work either failed to include state-of-the-art PLMs or was not evaluated on realistic datasets. In this work, we perform the first extensive study on directly applying PLMs for APR. We select 9 recent state-of-the-art PLMs, including both generative and infilling models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability