A Step Toward Quantifying Independently Reproducible Machine Learning Research
Edward Raff

TL;DR
This paper empirically investigates what makes machine learning research reproducible by manually attempting to implement 255 papers and analyzing various features, moving beyond assumptions to quantifiable measures.
Contribution
It introduces a systematic, empirical approach to assess reproducibility in machine learning research by analyzing a large sample of papers without relying solely on code release.
Findings
Manual implementation of 255 papers from 1984-2017
Statistical analysis of features related to reproducibility
Highlights gaps between code availability and actual reproducibility
Abstract
What makes a paper independently reproducible? Debates on reproducibility center around intuition or assumptions but lack empirical results. Our field focuses on releasing code, which is important, but is not sufficient for determining reproducibility. We take the first step toward a quantifiable answer by manually attempting to implement 255 papers published from 1984 until 2017, recording features of each paper, and performing statistical analysis of the results. For each paper, we did not look at the authors code, if released, in order to prevent bias toward discrepancies between code and paper.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
