Sources of Irreproducibility in Machine Learning: A Review
Odd Erik Gundersen, Kevin Coakley, Christine Kirkpatrick, Yolanda, Gil

TL;DR
This paper reviews the causes of irreproducibility in machine learning, proposing a structured framework to understand how experiment design choices impact reproducibility and the validity of conclusions.
Contribution
It introduces a comprehensive framework linking experiment design factors to irreproducibility, aiding researchers in evaluating and improving ML study reproducibility.
Findings
Identified key factors affecting ML reproducibility
Organized factors within a scientific method-based framework
Demonstrated the framework with a model comparison study
Abstract
Background: Many published machine learning studies are irreproducible. Issues with methodology and not properly accounting for variation introduced by the algorithm themselves or their implementations are attributed as the main contributors to the irreproducibility.Problem: There exist no theoretical framework that relates experiment design choices to potential effects on the conclusions. Without such a framework, it is much harder for practitioners and researchers to evaluate experiment results and describe the limitations of experiments. The lack of such a framework also makes it harder for independent researchers to systematically attribute the causes of failed reproducibility experiments. Objective: The objective of this paper is to develop a framework that enable applied data science practitioners and researchers to understand which experiment design choices can lead to false…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
