Cloud Infrastructure Provenance Collection and Management to Reproduce Scientific Workflow Execution
Khawar Hasham, Kamran Munir, Richard McClatchey

TL;DR
This paper introduces ReCAP, a framework for capturing Cloud infrastructure provenance to enable reproducibility of scientific workflows by re-provisioning similar resources on the Cloud.
Contribution
The paper proposes a novel framework and mapping approaches for capturing Cloud-aware provenance, facilitating workflow reproducibility and resource re-provisioning without performance overhead.
Findings
Provenance collection impacts workflow performance analysis.
Mapping approaches effectively capture Cloud information across scenarios.
ReCAP enables accurate re-provisioning of resources for reproducibility.
Abstract
The emergence of Cloud computing provides a new computing paradigm for scientific workflow execution. It provides dynamic, on-demand and scalable resources that enable the processing of complex workflow-based experiments. With the ever growing size of the experimental data and increasingly complex processing workflows, the need for reproducibility has also become essential. Provenance has been thought of a mechanism to verify a workflow and to provide workflow reproducibility. One of the obstacles in reproducing an experiment execution is the lack of information about the execution infrastructure in the collected provenance. This information becomes critical in the context of Cloud in which resources are provisioned on-demand and by specifying resource configurations. Therefore, a mechanism is required that enables capturing of infrastructure information along with the provenance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
