Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI
John Wu, Zhenbang Wu, Jimeng Sun

TL;DR
This paper highlights the importance of open science practices in healthcare AI to improve reproducibility, trust, and impact, emphasizing the need for standardized data preprocessing, open datasets, and shared code.
Contribution
It identifies current reproducibility challenges in healthcare AI and advocates for open practices, standardization, and benchmarks to enhance trust and effectiveness.
Findings
74% of AI4H papers rely on private data or do not share code
Open science practices increase citation counts by over 100%
Standardized guidelines can improve model reproducibility and trustworthiness.
Abstract
Our analysis of recent AI4H publications reveals that, despite a trend toward utilizing open datasets and sharing modeling code, 74% of AI4H papers still rely on private datasets or do not share their code. This is especially concerning in healthcare applications, where trust is essential. Furthermore, inconsistent and poorly documented data preprocessing pipelines result in variable model performance reports, even for identical tasks and datasets, making it challenging to evaluate the true effectiveness of AI models. Despite the challenges posed by the reproducibility crisis, addressing these issues through open practices offers substantial benefits. For instance, while the reproducibility mandate adds extra effort to research and publication, it significantly enhances the impact of the work. Our analysis shows that papers that used both public datasets and shared code received, on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Scientific Computing and Data Management · Electronic Health Records Systems
