Variance of ML-based software fault predictors: are we really improving   fault prediction?

Xhulja Shahini; Domenic Bubel; Andreas Metzger

arXiv:2310.17264·cs.SE·October 27, 2023·1 cites

Variance of ML-based software fault predictors: are we really improving fault prediction?

Xhulja Shahini, Domenic Bubel, Andreas Metzger

PDF

Open Access

TL;DR

This paper investigates the impact of stochastic elements in ML-based fault prediction models, revealing significant variance in accuracy that challenges reproducibility and practical reliability.

Contribution

It provides an experimental analysis of variance caused by nondeterminism in ML fault predictors, highlighting the need to address this issue for reliable software quality assurance.

Findings

01

Maximum variance of 10.10% in per-class accuracy due to NI factors

02

Nondeterminism significantly affects fault prediction model performance

03

Discussion on strategies to mitigate variance effects

Abstract

Software quality assurance activities become increasingly difficult as software systems become more and more complex and continuously grow in size. Moreover, testing becomes even more expensive when dealing with large-scale systems. Thus, to effectively allocate quality assurance resources, researchers have proposed fault prediction (FP) which utilizes machine learning (ML) to predict fault-prone code areas. However, ML algorithms typically make use of stochastic elements to increase the prediction models' generalizability and efficiency of the training process. These stochastic elements, also known as nondeterminism-introducing (NI) factors, lead to variance in the training process and as a result, lead to variance in prediction accuracy and training time. This variance poses a challenge for reproducibility in research. More importantly, while fault prediction models may have shown…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques