Automated Classification of Overfitting Patches with Statically   Extracted Code Features

He Ye; Jian Gu; Matias Martinez; Thomas Durieux; Martin Monperrus

arXiv:1910.12057·cs.SE·August 9, 2021

Automated Classification of Overfitting Patches with Statically Extracted Code Features

He Ye, Jian Gu, Matias Martinez, Thomas Durieux, Martin Monperrus

PDF

2 Repos

TL;DR

This paper introduces ODS, a static analysis and machine learning-based system for detecting overfitting patches in automatic program repair, significantly improving correctness classification accuracy.

Contribution

The paper presents a novel static feature extraction and supervised learning approach for overfitting patch detection, outperforming existing methods.

Findings

01

ODS correctly classifies 71.9% of patches, surpassing previous methods.

02

Large-scale evaluation on 10,302 patches from multiple benchmarks.

03

Applicable as a post-processing step for various APR systems.

Abstract

Automatic program repair (APR) aims to reduce the cost of manually fixing software defects. However, APR suffers from generating a multitude of overfitting patches, those patches that fail to correctly repair the defect beyond making the tests pass. This paper presents a novel overfitting patch detection system called ODS to assess the correctness of APR patches. ODS first statically compares a patched program and a buggy program in order to extract code features at the abstract syntax tree (AST) level. Then, ODS uses supervised learning with the captured code features and patch correctness labels to automatically learn a probabilistic model. The learned ODS model can then finally be applied to classify new and unseen program repair patches. We conduct a large-scale experiment to evaluate the effectiveness of ODS on patch correctness classification based on 10,302 patches from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.