TL;DR
This study evaluates the reproducibility of R code supplements on OSF, revealing low success rates due to missing dependencies and system issues, and proposes automated tools to improve reproducibility verification.
Contribution
The paper introduces an automated pipeline for reconstructing R project environments and assessing reproducibility, highlighting key barriers and solutions for scientific code sharing.
Findings
25.87% of R scripts ran successfully in Docker containers
98.8% of projects lacked formal dependency documentation
Automated dependency inference improves reproducibility verification
Abstract
Computational reproducibility is fundamental to scientific research, yet many published code supplements lack the necessary documentation to recreate their computational environments. While researchers increasingly share code alongside publications, the actual reproducibility of these materials remains poorly understood. In this work, we assess the computational reproducibility of 296 R projects using the StatCodeSearch dataset. Of these, only 264 were still retrievable, and 98.8% lacked formal dependency descriptions required for successful execution. To address this, we developed an automated pipeline that reconstructs computational environments directly from project source code. Applying this pipeline, we executed the R scripts within custom Docker containers and found that 25.87% completed successfully without error. We conducted a detailed analysis of execution failures,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
