Provenance as Dependency Analysis
James Cheney, Amal Ahmed, and Umut Acar

TL;DR
This paper explores the formal foundations of data provenance using dependency analysis, introduces a semantic framework, and proposes approximation methods due to the non-computability of exact dependency provenance.
Contribution
It applies dependency analysis from program analysis to formalize provenance, providing a semantic characterization and approximation techniques.
Findings
Dependency provenance is semantically characterized.
Exact dependency provenance is non-computable.
Dynamic and static approximation methods are proposed.
Abstract
Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Research Data Management Practices
