Privacy Aspects of Provenance Queries
Tanja Auge, Nic Scharlau, Andreas Heuer

TL;DR
This paper explores the privacy challenges in publishing provenance information for database query results, highlighting conflicts between transparency and privacy, and proposing initial solutions to protect sensitive data.
Contribution
It extends the concept of privacy to include intellectual property protection in the context of provenance queries and discusses fundamental problems and potential solutions.
Findings
Publishing provenance can reveal more data than necessary, risking privacy violations.
Provenance information may include quasi-identifiers, leading to privacy breaches.
Fundamental issues in balancing provenance transparency and privacy are identified.
Abstract
Given a query result of a big database, why-provenance can be used to calculate the necessary part of this database, consisting of so-called witnesses. If this database consists of personal data, privacy protection has to prevent the publication of these witnesses. This implies a natural conflict of interest between publishing original data (provenance) and protecting these data (privacy). In this paper, privacy goes beyond the concept of personal data protection. The paper gives an extended definition of privacy as intellectual property protection. If the provenance information is not sufficient to reconstruct a query result, additional data such as witnesses or provenance polynomials have to be published to guarantee traceability. Nevertheless, publishing this provenance information might be a problem if (significantly) more tuples than necessary can be derived from the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
