Toward a consistent performance evaluation for defect prediction models

Xutong Liu; Shiran Liu; Zhaoqiang Guo; Peng Zhag; Yibiao Yang; Huihui; Liu; Hongmin Lu; Yanhui Li; Lin Chen; Yuming Zhou

arXiv:2302.00394·cs.SE·February 7, 2023

Toward a consistent performance evaluation for defect prediction models

Xutong Liu, Shiran Liu, Zhaoqiang Guo, Peng Zhag, Yibiao Yang, Huihui, Liu, Hongmin Lu, Yanhui Li, Lin Chen, Yuming Zhou

PDF

Open Access

TL;DR

This paper introduces MATTER, a standardized framework for evaluating defect prediction models to ensure consistent, fair, and comparable performance assessments across studies, revealing many models underperform simple baselines.

Contribution

The paper proposes a comprehensive evaluation framework, including a baseline model, a unified threshold setting, and core performance indicators, to standardize defect prediction model assessments.

Findings

01

Most recent defect prediction models are not better than the simple baseline ONE.

02

MATTER enables consistent comparison across different studies.

03

The perceived progress in defect prediction may be overestimated.

Abstract

In defect prediction community, many defect prediction models have been proposed and indeed more new models are continuously being developed. However, there is no consensus on how to evaluate the performance of a newly proposed model. In this paper, we aim to propose MATTER, a fraMework towArd a consisTenT pErformance compaRison, which makes model performance directly comparable across different studies. We take three actions to build a consistent evaluation framework for defect prediction models. First, we propose a simple and easy-to-use unsupervised baseline model ONE (glObal baseliNe modEl) to provide "a single point of comparison". Second, we propose using the SQA-effort-aligned threshold setting to make a fair comparison. Third, we suggest reporting the evaluation results in a unified way and provide a set of core performance indicators for this purpose, thus enabling an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNon-Destructive Testing Techniques · Industrial Vision Systems and Defect Detection · Hydrogen embrittlement and corrosion behaviors in metals