Reproducibility in Computational Materials Science: Lessons from 'A General-Purpose Machine Learning Framework for Predicting Properties of Inorganic Materials'
Daniel Persaud, Logan Ward, Jason Hattrick-Simpers

TL;DR
This paper examines reproducibility challenges in computational materials science, specifically analyzing issues encountered when reproducing results from a machine learning framework for inorganic materials, and proposes actionable solutions.
Contribution
It identifies key reproducibility barriers in computational materials science and offers practical recommendations to improve code accessibility and transparency.
Findings
Reproducibility issues stem from reporting dependencies and version logs.
Sequential code organization and unclear references hinder reproducibility.
Proposed action items aim to enhance code sharing and reproducibility.
Abstract
The integration of machine learning techniques in materials discovery has become prominent in materials science research and has been accompanied by an increasing trend towards open-source data and tools to propel the field. Despite the increasing usefulness and capabilities of these tools, developers neglecting to follow reproducible practices creates a significant barrier for researchers looking to use or build upon their work. In this study, we investigate the challenges encountered while attempting to reproduce a section of the results presented in "A general-purpose machine learning framework for predicting properties of inorganic materials." Our analysis identifies four major categories of challenges: (1) reporting computational dependencies, (2) recording and sharing version logs, (3) sequential code organization, and (4) clarifying code references within the manuscript. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Scientific Computing and Data Management · Research Data Management Practices
