More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research
Moritz Wolter, Lokesh Veeramacheneni, Charles Tapley Hoyt

TL;DR
This paper highlights the importance of rigorous software engineering practices to enhance reproducibility in machine learning research, supported by a survey of current practices and concrete community recommendations.
Contribution
It provides an empirical analysis of software best practices in ML research and offers specific recommendations to improve reproducibility.
Findings
Software practices are often undervalued in ML research.
There are significant gaps in reproducibility-related practices.
Community guidelines can improve research reliability.
Abstract
While experimental reproduction remains a pillar of the scientific method, we observe that the software best practices supporting the reproduction of machine learning ( ML ) research are often undervalued or overlooked, leading both to poor reproducibility and damage to trust in the ML community. We quantify these concerns by surveying the usage of software best practices in software repositories associated with publications at major ML conferences and journals such as NeurIPS, ICML, ICLR, TMLR, and MLOSS within the last decade. We report the results of this survey that identify areas where software best practices are lacking and areas with potential for growth in the ML community. Finally, we discuss the implications and present concrete recommendations on how we, as a community, can improve reproducibility in ML research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)
