ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment
Sheeba Samuel, Daniel Mietchen, Jungsan Kim, Waqas Ahmed, Martin Gaedke

TL;DR
ReproScore introduces a two-tier framework to distinguish between software readiness and actual reproducibility outcomes, enabling scalable assessment of research software in digital libraries.
Contribution
It presents a novel separation of readiness and outcome metrics, along with a composite score, validated on a large corpus of GitHub repositories.
Findings
Environment category strongly discriminates failure modes.
Readiness metrics show near-zero correlation with success.
Separation of metrics is necessary and validated.
Abstract
Digital libraries curate millions of research software artefacts yet lack scalable infrastructure for assessing whether those artefacts remain executable. Existing automated assessment tools treat static repository completeness -- what a repository contains -- as a proxy for execution success -- whether it runs. We term this the readiness-outcome conflation and present ReproScore, a two-tier framework that explicitly separates reproducibility readiness (RRS) from reproducibility outcome (ROS), combining them into a coverage-adaptive Composite Score (RCS). RRS comprises 26 sub-metrics across five categories; ROS provides execution-based probes when sandbox infrastructure is available; a community rubric externalises weighting priorities as versioned YAML profiles. Evaluated on 423 GitHub repositories from a large-scale ground-truth corpus spanning five failure modes, two complementary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
