Trusted Provenance of Automated, Collaborative and Adaptive Data   Processing Pipelines

Ludwig Stage; Dimka Karastoyanova

arXiv:2310.11442·cs.CR·October 18, 2023·1 cites

Trusted Provenance of Automated, Collaborative and Adaptive Data Processing Pipelines

Ludwig Stage, Dimka Karastoyanova

PDF

Open Access

TL;DR

This paper introduces a trusted provenance service architecture for collaborative, adaptive data processing pipelines, enabling secure tracking of changes and fostering trust in cross-organizational collaborations.

Contribution

It proposes a novel architecture and proof of concept for a provenance service that captures change history in collaborative data pipelines, addressing trust issues.

Findings

01

Designed a Provenance Holder service architecture

02

Implemented a proof of concept demonstrating trusted provenance tracking

03

Defined properties for trusted provenance services

Abstract

To benefit from the abundance of data and the insights it brings data processing pipelines are being used in many areas of research and development in both industry and academia. One approach to automating data processing pipelines is the workflow technology, as it also supports collaborative, trial-and-error experimentation with the pipeline architecture in different application domains. In addition to the necessary flexibility that such pipelines need to possess, in collaborative settings cross-organisational interactions are plagued by lack of trust. While capturing provenance information related to the pipeline execution and the processed data is a first step towards enabling trusted collaborations, the current solutions do not allow for provenance of the change in the processing pipelines, where the subject of change can be made on any aspect of the workflow implementing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Research Data Management Practices