On Optimizing the Trade-off between Privacy and Utility in Data Provenance
Daniel Deutch, Ariel Frankenthal, Amir Gilad, Yuval Moskovitch

TL;DR
This paper formalizes the tradeoff between privacy and utility in data provenance, proposing an optimization framework and heuristics to balance disclosure and confidentiality of query information.
Contribution
It introduces a novel formalization of provenance abstraction for privacy-utility tradeoff and develops practical heuristics for optimizing this balance.
Findings
Heuristic algorithms effectively balance privacy and utility in provenance data.
Experimental results demonstrate the approach's effectiveness on benchmark datasets.
The formalization provides a new perspective on privacy-preserving data provenance.
Abstract
Organizations that collect and analyze data may wish or be mandated by regulation to justify and explain their analysis results. At the same time, the logic that they have followed to analyze the data, i.e., their queries, may be proprietary and confidential. Data provenance, a record of the transformations that data underwent, was extensively studied as means of explanations. In contrast, only a few works have studied the tension between disclosing provenance and hiding the underlying query. This tension is the focus of the present paper, where we formalize and explore for the first time the tradeoff between the utility of presenting provenance information and the breach of privacy it poses with respect to the underlying query. Intuitively, our formalization is based on the notion of provenance abstraction, where the representation of some tuples in the provenance expressions is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
