Containing the Reproducibility Gap: Automated Repository-Level Containerization for Scholarly Jupyter Notebooks
Sheeba Samuel, Daniel Mietchen, Hemanta Lo, Martin Gaedke

TL;DR
This paper introduces an automated pipeline that reconstructs and evaluates execution environments for scholarly Jupyter notebooks, significantly improving reproducibility and identifying remaining challenges.
Contribution
It presents a scalable, automated system for containerizing and testing notebooks at the repository level, addressing environment drift and dependency issues.
Findings
Containerization resolves 66.7% of dependency failures.
53.7% of notebooks show low output fidelity due to runtime issues.
The approach improves execution robustness but does not achieve full reproducibility.
Abstract
Computational reproducibility is fundamental to trustworthy science, yet remains difficult to achieve in practice across various research workflows, including Jupyter notebooks published alongside scholarly articles. Environment drift, undocumented dependencies and implicit execution assumptions frequently prevent independent re-execution of published research. Despite existing reproducibility guidelines, scalable and systematic infrastructure for automated assessment remains limited. We present an automated, web-oriented reproducibility engineering pipeline that reconstructs and evaluates repository-level execution environments for scholarly notebooks. The system performs dependency inference, automated container generation, and isolated execution to approximate the notebook's original computational context. We evaluate the approach on 443 notebooks from 116 GitHub repositories…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
