On the Time-Based Conclusion Stability of Cross-Project Defect Prediction Models
Abdul Ali Bangash, Hareem Sahar, Abram Hindle, Karim Ali

TL;DR
This paper investigates whether defect prediction models' conclusions remain stable over time, revealing that their performance varies significantly across different time periods, which challenges the generalization of empirical findings.
Contribution
It introduces a time-aware evaluation approach for defect prediction models, highlighting the temporal instability of their performance and emphasizing the need for context-specific claims.
Findings
Model performance varies across different time periods.
Product updates can drastically change defect prediction accuracy.
Empirical claims should be limited to specific evaluation contexts.
Abstract
Researchers in empirical software engineering often make claims based on observable data such as defect reports. Unfortunately, in many cases, these claims are generalized beyond the data sets that have been evaluated. Will the researcher's conclusions hold a year from now for the same software projects? Perhaps not. Recent studies show that in the area of Software Analytics, conclusions over different data sets are usually inconsistent. In this article, we empirically investigate whether conclusions in the area of defect prediction truly exhibit stability throughout time or not. Our investigation applies a time-aware evaluation approach where models are trained only on the past, and evaluations are executed only on the future. Through this time-aware evaluation, we show that depending on which time period we evaluate defect predictors, their performance, in terms of F-Score, the area…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
