Provenance Views for Module Privacy
Susan B. Davidson, Sanjeev Khanna, Tova Milo, Debmalya Panigrahi,, Sudeepa Roy

TL;DR
This paper addresses protecting proprietary module information in scientific workflows by creating privacy-preserving views of provenance data, balancing data concealment costs with privacy guarantees.
Contribution
It formally defines the 'secureview' problem, analyzes its complexity, and proposes algorithms to generate privacy-preserving views in workflow provenance.
Findings
The 'secureview' problem is computationally complex.
Algorithms can effectively generate privacy-preserving views.
Trade-offs exist between data hiding costs and privacy levels.
Abstract
Scientific workflow systems increasingly store provenance information about the module executions used to produce a data item, as well as the parameter settings and intermediate data items passed between module executions. However, authors/owners of workflows may wish to keep some of this information confidential. In particular, a module may be proprietary, and users should not be able to infer its behavior by seeing mappings between all data inputs and outputs. The problem we address in this paper is the following: Given a workflow, abstractly modeled by a relation R, a privacy requirement \Gamma and costs associated with data. The owner of the workflow decides which data (attributes) to hide, and provides the user with a view R' which is the projection of R over attributes which have not been hidden. The goal is to minimize the cost of hidden data while guaranteeing that individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Privacy-Preserving Technologies in Data · Data Quality and Management
