A Framework to capture and reproduce the Absolute State of Jupyter Notebooks
Dimuthu Wannipurage, Suresh Marru, Marlon Pierce

TL;DR
This paper introduces a framework using Jupyter's extension mechanisms to capture and reproduce the complete, executable state of notebooks, enhancing reproducibility across different environments with minimal performance impact.
Contribution
It presents a novel system for archiving and restoring the full state of Jupyter Notebooks, including code, environment, and runtime variables, using standard extensions.
Findings
Minimal execution overhead observed
Successful replication of notebook state across environments
Enhanced reproducibility of computational research
Abstract
Jupyter Notebooks are an enormously popular tool for creating and narrating computational research projects. They also have enormous potential for creating reproducible scientific research artifacts. Capturing the complete state of a notebook has additional benefits; for instance, the notebook execution may be split between local and remote resources, where the latter may have more powerful processing capabilities or store large or access-limited data. There are several challenges for making notebooks fully reproducible when examined in detail. The notebook code must be replicated entirely, and the underlying Python runtime environments must be identical. More subtle problems arise in replicating referenced data, external library dependencies, and runtime variable states. This paper presents solutions to these problems using Juptyer's standard extension mechanisms to create an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
