The challenge of reproducible ML: an empirical study on the impact of   bugs

Emilio Rivera-Landos; Foutse Khomh; Amin Nikanjam

arXiv:2109.03991·cs.SE·September 10, 2021

The challenge of reproducible ML: an empirical study on the impact of bugs

Emilio Rivera-Landos, Foutse Khomh, Amin Nikanjam

PDF

Open Access 1 Repo

TL;DR

This paper investigates the impact of bugs in ML libraries on experiment reproducibility, introduces ReproduceML for deterministic evaluation, and finds limited evidence that bugs in PyTorch affect model performance.

Contribution

It presents ReproduceML, a framework for deterministic ML experiment evaluation, and proposes a methodology to study the effects of bugs on reproducibility.

Findings

01

No significant impact of PyTorch bugs on model performance found

02

ReproduceML enables controlled, reproducible ML experiments

03

Methodology facilitates further research on non-determinism in ML

Abstract

Reproducibility is a crucial requirement in scientific research. When results of research studies and scientific papers have been found difficult or impossible to reproduce, we face a challenge which is called reproducibility crisis. Although the demand for reproducibility in Machine Learning (ML) is acknowledged in the literature, a main barrier is inherent non-determinism in ML training and inference. In this paper, we establish the fundamental factors that cause non-determinism in ML systems. A framework, ReproduceML, is then introduced for deterministic evaluation of ML experiments in a real, controlled environment. ReproduceML allows researchers to investigate software configuration effects on ML training and inference. Using ReproduceML, we run a case study: investigation of the impact of bugs inside ML libraries on performance of ML experiments. This study attempts to quantify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

swatlab/ml-frameworks-evaluation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Software Engineering Research · Explainable Artificial Intelligence (XAI)