Toward a consistent performance evaluation for defect prediction models
Xutong Liu, Shiran Liu, Zhaoqiang Guo, Peng Zhag, Yibiao Yang, Huihui, Liu, Hongmin Lu, Yanhui Li, Lin Chen, Yuming Zhou

TL;DR
This paper introduces MATTER, a standardized framework for evaluating defect prediction models to ensure consistent, fair, and comparable performance assessments across studies, revealing many models underperform simple baselines.
Contribution
The paper proposes a comprehensive evaluation framework, including a baseline model, a unified threshold setting, and core performance indicators, to standardize defect prediction model assessments.
Findings
Most recent defect prediction models are not better than the simple baseline ONE.
MATTER enables consistent comparison across different studies.
The perceived progress in defect prediction may be overestimated.
Abstract
In defect prediction community, many defect prediction models have been proposed and indeed more new models are continuously being developed. However, there is no consensus on how to evaluate the performance of a newly proposed model. In this paper, we aim to propose MATTER, a fraMework towArd a consisTenT pErformance compaRison, which makes model performance directly comparable across different studies. We take three actions to build a consistent evaluation framework for defect prediction models. First, we propose a simple and easy-to-use unsupervised baseline model ONE (glObal baseliNe modEl) to provide "a single point of comparison". Second, we propose using the SQA-effort-aligned threshold setting to make a fair comparison. Third, we suggest reporting the evaluation results in a unified way and provide a set of core performance indicators for this purpose, thus enabling an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNon-Destructive Testing Techniques · Industrial Vision Systems and Defect Detection · Hydrogen embrittlement and corrosion behaviors in metals
