Provenance Traces
James Cheney, Umut Acar, Amal Ahmed

TL;DR
Provenance traces offer a formal, operational semantics-based approach to capturing and comparing data origin and derivation in database queries, ensuring consistency and fidelity in explanations of data processing.
Contribution
The paper introduces provenance traces as a formal, operational semantics-based model for provenance in the nested relational calculus, unifying and strengthening previous approaches.
Findings
Provenance traces can be derived from existing NRC provenance models.
Traces satisfy semantic guarantees of consistency and fidelity.
They provide a foundation for comparing and unifying provenance models.
Abstract
Provenance is information about the origin, derivation, ownership, or history of an object. It has recently been studied extensively in scientific databases and other settings due to its importance in helping scientists judge data validity, quality and integrity. However, most models of provenance have been stated as ad hoc definitions motivated by informal concepts such as "comes from", "influences", "produces", or "depends on". These models lack clear formalizations describing in what sense the definitions capture these intuitive concepts. This makes it difficult to compare approaches, evaluate their effectiveness, or argue about their validity. We introduce provenance traces, a general form of provenance for the nested relational calculus (NRC), a core database query language. Provenance traces can be thought of as concrete data structures representing the operational semantics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Data Quality and Management
