Automatic vs Manual Provenance Abstractions: Mind the Gap

Pinar Alper; Khalid Belhajjame; Carole A. Goble

arXiv:1605.06669·cs.SE·May 24, 2016·1 cites

Automatic vs Manual Provenance Abstractions: Mind the Gap

Pinar Alper, Khalid Belhajjame, Carole A. Goble

PDF

Open Access

TL;DR

This paper compares manual and semi-automatic provenance abstraction techniques in scientific workflows, revealing significant differences in data artefacts retained and discussing implications for future research.

Contribution

It provides an empirical comparison between manual and semi-automatic provenance abstractions, highlighting their overlaps and differences in data retention.

Findings

01

Semi-automatic and manual abstractions largely overlap process-wise.

02

Significant mismatch in data artefacts retained between approaches.

03

Discussion on reasons and future research directions.

Abstract

In recent years the need to simplify or to hide sensitive information in provenance has given way to research on provenance abstraction. In the context of scientific workflows, existing research provides techniques to semi automatically create abstractions of a given workflow description, which is in turn used as filters over the workflow's provenance traces. An alternative approach that is commonly adopted by scientists is to build workflows with abstractions embedded into the workflow's design, such as using sub-workflows. This paper reports on the comparison of manual versus semi-automated approaches in a context where result abstractions are used to filter report-worthy results of computational scientific analyses. Specifically; we take a real-world workflow containing user-created design abstractions and compare these with abstractions created by ZOOM UserViews and Workflow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Research Data Management Practices · Distributed and Parallel Computing Systems