Exploring the relationship between performance metrics and cost saving potential of defect prediction models
Steffen Tunkel, Steffen Herbold

TL;DR
This study investigates whether traditional performance metrics reliably indicate the cost-saving potential of defect prediction models, finding no stable relationship and emphasizing the need to evaluate cost savings directly.
Contribution
It provides empirical evidence that performance metrics are poor proxies for cost savings in defect prediction, proposing direct evaluation of economic benefits.
Findings
No stable relationship between performance metrics and cost savings.
Performance metrics fail to account for large software artifacts driving costs.
Cost savings should be evaluated directly rather than inferred from metrics.
Abstract
Context: Performance metrics are a core component of the evaluation of any machine learning model and used to compare models and estimate their usefulness. Recent work started to question the validity of many performance metrics for this purpose in the context of software defect prediction. Objective: Within this study, we explore the relationship between performance metrics and the cost saving potential of defect prediction models. We study whether performance metrics are suitable proxies to evaluate the cost saving capabilities and derive a theory for the relationship between performance metrics and cost saving potential. Methods: We measure performance metrics and cost saving potential in defect prediction experiments. We use a multinomial logit model, decision, and random forest to model the relationship between the metrics and the cost savings. Results: We could not find a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Imbalanced Data Classification Techniques
