The Impact of Dormant Defects on Defect Prediction: a Study of 19 Apache Projects
Davide Falessi, Aalok Ahluwalia, Massimiliano Di Penta

TL;DR
This study examines how dormant defects, discovered long after their introduction, affect defect prediction accuracy in open source projects and proposes data filtering as a mitigation strategy.
Contribution
It analyzes the impact of dormant defects on classifier accuracy and evaluates the effectiveness of removing recent non-defective data to improve predictions.
Findings
Dormant defects reduce recall of defect classifiers.
Removing recent non-defective data improves classifier accuracy.
Mitigating dormant defects enhances defect dataset quality.
Abstract
Defect prediction models can be beneficial to prioritize testing, analysis, or code review activities, and has been the subject of a substantial effort in academia, and some applications in industrial contexts. A necessary precondition when creating a defect prediction model is the availability of defect data from the history of projects. If this data is noisy, the resulting defect prediction model could result to be unreliable. One of the causes of noise for defect datasets is the presence of "dormant defects", i.e., of defects discovered several releases after their introduction. This can cause a class to be labeled as defect-free while it is not, and is, therefore "snoring". In this paper, we investigate the impact of snoring on classifiers' accuracy and the effectiveness of a possible countermeasure, i.e., dropping too recent data from a training set. We analyze the accuracy of 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
