The challenge of reproducible ML: an empirical study on the impact of bugs
Emilio Rivera-Landos, Foutse Khomh, Amin Nikanjam

TL;DR
This paper investigates the impact of bugs in ML libraries on experiment reproducibility, introduces ReproduceML for deterministic evaluation, and finds limited evidence that bugs in PyTorch affect model performance.
Contribution
It presents ReproduceML, a framework for deterministic ML experiment evaluation, and proposes a methodology to study the effects of bugs on reproducibility.
Findings
No significant impact of PyTorch bugs on model performance found
ReproduceML enables controlled, reproducible ML experiments
Methodology facilitates further research on non-determinism in ML
Abstract
Reproducibility is a crucial requirement in scientific research. When results of research studies and scientific papers have been found difficult or impossible to reproduce, we face a challenge which is called reproducibility crisis. Although the demand for reproducibility in Machine Learning (ML) is acknowledged in the literature, a main barrier is inherent non-determinism in ML training and inference. In this paper, we establish the fundamental factors that cause non-determinism in ML systems. A framework, ReproduceML, is then introduced for deterministic evaluation of ML experiments in a real, controlled environment. ReproduceML allows researchers to investigate software configuration effects on ML training and inference. Using ReproduceML, we run a case study: investigation of the impact of bugs inside ML libraries on performance of ML experiments. This study attempts to quantify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Software Engineering Research · Explainable Artificial Intelligence (XAI)
