More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Moritz Wolter; Lokesh Veeramacheneni; Charles Tapley Hoyt

arXiv:2502.00902·cs.SE·September 3, 2025

More Rigorous Software Engineering Would Improve Reproducibility in Machine Learning Research

Moritz Wolter, Lokesh Veeramacheneni, Charles Tapley Hoyt

PDF

Open Access 1 Repo

TL;DR

This paper highlights the importance of rigorous software engineering practices to enhance reproducibility in machine learning research, supported by a survey of current practices and concrete community recommendations.

Contribution

It provides an empirical analysis of software best practices in ML research and offers specific recommendations to improve reproducibility.

Findings

01

Software practices are often undervalued in ML research.

02

There are significant gaps in reproducibility-related practices.

03

Community guidelines can improve research reliability.

Abstract

While experimental reproduction remains a pillar of the scientific method, we observe that the software best practices supporting the reproduction of machine learning ( ML ) research are often undervalued or overlooked, leading both to poor reproducibility and damage to trust in the ML community. We quantify these concerns by surveying the usage of software best practices in software repositories associated with publications at major ML conferences and journals such as NeurIPS, ICML, ICLR, TMLR, and MLOSS within the last decade. We report the results of this survey that identify areas where software best practices are lacking and areas with potential for growth in the ML community. Finally, we discuss the implications and present concrete recommendations on how we, as a community, can improve reproducibility in ML research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BonnBytes/position_we_need_more_tests_in_ml
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Scientific Computing and Data Management · Explainable Artificial Intelligence (XAI)