Towards Reliable Evaluation of Neural Program Repair with Natural   Robustness Testing

Thanh Le-Cong; Dat Nguyen; Bach Le; Toby Murray

arXiv:2402.11892·cs.SE·November 14, 2024·3 cites

Towards Reliable Evaluation of Neural Program Repair with Natural Robustness Testing

Thanh Le-Cong, Dat Nguyen, Bach Le, Toby Murray

PDF

Open Access 1 Repo

TL;DR

This paper advocates for evaluating neural program repair robustness using naturally-occurring data transformations, revealing their impact on performance and proposing an LLM-based naturalness assessment metric.

Contribution

It introduces a naturalness-focused robustness testing framework for NPR, including a human study on transformation naturalness and an LLM-based automatic assessment method.

Findings

01

Only 60% of transformations are natural according to human judgment.

02

NPR performance significantly drops on transformed datasets.

03

Different NPR techniques show varied robustness, indicating evaluation biases.

Abstract

In this paper, we propose shifting the focus of robustness evaluation for Neural Program Repair (NPR) techniques toward naturally-occurring data transformations. To accomplish this, we first examine the naturalness of semantic-preserving transformations through a two-stage human study. This study includes (1) interviews with senior software developers to establish concrete criteria for evaluating the naturalness of these transformations, and (2) a survey involving 10 developers to assess the naturalness of 1,178 transformations, i.e., pairs of original and transformed programs, applied to 225 real-world bugs. Our findings show that only 60% of these transformations are deemed natural, while 20% are considered unnatural, with strong agreement among annotators. Moreover, the unnaturalness of these transformations significantly impacts both their applicability to benchmarks and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thanhlecongg/naturaltransformationforbenchmarkingnpr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software System Performance and Reliability

MethodsFocus