The Untold Impact of Learning Approaches on Software Fault-Proneness Predictions
Mohammad Jamil Ahmad, Katerina Goseva-Popstojanova, Robyn R. Lutz

TL;DR
This study investigates how different learning approaches significantly influence software fault-proneness prediction performance, revealing that approach choice impacts results and should be explicitly considered in research and practice.
Contribution
It is the first comprehensive analysis of how learning approaches affect fault prediction, demonstrating the importance of approach selection and class imbalance handling.
Findings
useAllPredictAll approach outperforms usePrePredictPost in classification accuracy
Class imbalance explains performance differences between approaches
Addressing class imbalance equalizes the performance of different approaches
Abstract
Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learning approaches, useAllPredictAll and usePrePredictPost, on the performance of software fault-proneness prediction, both within-release and across-releases. The empirical results are based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on the classification performance. Specifically, using useAllPredictAll leads to significantly better performance than using usePrePredictPost…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Imbalanced Data Classification Techniques
