Formal Definition and Implementation of Reproducibility Tenets for Computational Workflows
Nicholas J. Pritchard, Andreas Wicenec

TL;DR
This paper introduces a formal, system-agnostic workflow model with reproducibility tenets, a cryptographic signature method, and an implementation in DALiuGE to enhance reproducibility and verification in scientific workflows, demonstrated through astronomical data processing.
Contribution
It extends reproducibility concepts into formal tenets, develops a cryptographic signature method for workflows, and implements these in DALiuGE for improved reproducibility verification.
Findings
Workflow signatures can be generated in amortized constant time.
The approach facilitates automatic formal verification of scientific workflows.
Demonstrated effectiveness in astronomical data processing tasks.
Abstract
Computational workflow management systems power contemporary data-intensive sciences. The slowly resolving reproducibility crisis presents both a sobering warning and an opportunity to iterate on what science and data processing entails. The Square Kilometre Array (SKA), the world's largest radio telescope, is among the most extensive scientific projects underway and presents grand scientific collaboration and data-processing challenges. In this work, we aim to improve the ability of workflow management systems to facilitate reproducible, high-quality science. This work presents a scale and system-agnostic computational workflow model and extends five well-known reproducibility concepts into seven well-defined tenets for this workflow model. Additionally, we present a method to construct workflow execution signatures using cryptographic primitives in amortized constant time. We combine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Research Data Management Practices
