Automated Patch Assessment for Program Repair at Scale

He Ye; Matias Martinez; Martin Monperrus

arXiv:1909.13694·cs.SE·May 10, 2021

Automated Patch Assessment for Program Repair at Scale

He Ye, Matias Martinez, Martin Monperrus

PDF

1 Repo

TL;DR

This paper presents an improved automated correctness assessment method for program repair patches using Random testing with Ground Truth, significantly enhancing evaluation accuracy and reliability across a large dataset.

Contribution

It introduces an improved RGT technique that boosts patch assessment accuracy by 190%, demonstrating its reliability and broad applicability in program repair evaluation.

Findings

01

Improved patch assessment accuracy by 190%

02

RGT is reliable for overfitting analysis

03

Largest study enhancing external validity

Abstract

In this paper, we do automatic correctness assessment for patches generated by program repair systems. We consider the human-written patch as ground truth oracle and randomly generate tests based on it, a technique proposed by Shamshiri et al., called Random testing with Ground Truth (RGT) in this paper. We build a curated dataset of 638 patches for Defects4J generated by 14 state-of-the-art repair systems, we evaluate automated patch assessment on this dataset. The results of this study are novel and significant: First, we improve the state of the art performance of automatic patch assessment with RGT by 190% by improving the oracle; Second, we show that RGT is reliable enough to help scientists to do overfitting analysis when they evaluate program repair systems; Third, we improve the external validity of the program repair knowledge with the largest study ever.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KTH/drr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.