The Prevalence of Errors in Machine Learning Experiments

Martin Shepperd; Yuchen Guo; Ning Li; Mahir Arzoky; Andrea Capiluppi,; Steve Counsell; Giuseppe Destefanis; Stephen Swift; Allan Tucker; and Leila; Yousefi

arXiv:1909.04436·cs.LG·September 11, 2019

The Prevalence of Errors in Machine Learning Experiments

Martin Shepperd, Yuchen Guo, Ning Li, Mahir Arzoky, Andrea Capiluppi,, Steve Counsell, Giuseppe Destefanis, Stephen Swift, Allan Tucker, and Leila, Yousefi

PDF

TL;DR

This study reveals a high prevalence of simple arithmetic and statistical errors in machine learning experiments within software defect prediction, emphasizing the need for improved transparency and verification practices.

Contribution

It systematically identifies and quantifies common errors in ML experiment reporting, highlighting the importance of open science principles for reliability.

Findings

01

22 out of 49 papers contained errors

02

7 papers had statistical errors

03

16 papers had confusion matrix inconsistencies

Abstract

Context: Conducting experiments is central to research machine learning research to benchmark, evaluate and compare learning algorithms. Consequently it is important we conduct reliable, trustworthy experiments. Objective: We investigate the incidence of errors in a sample of machine learning experiments in the domain of software defect prediction. Our focus is simple arithmetical and statistical errors. Method: We analyse 49 papers describing 2456 individual experimental results from a previously undertaken systematic review comparing supervised and unsupervised defect prediction classifiers. We extract the confusion matrices and test for relevant constraints, e.g., the marginal probabilities must sum to one. We also check for multiple statistical significance testing errors. Results: We find that a total of 22 out of 49 papers contain demonstrable errors. Of these 7 were statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.