Database Queries that Explain their Work
James Cheney, Amal Ahmed, Umut A. Acar

TL;DR
This paper introduces a formal provenance tracing model for nested relational calculus queries, enabling explanation and reproducibility through trace slicing and differencing, with proven correctness and a Haskell implementation.
Contribution
It provides a formal provenance model with slicing algorithms for explaining and recomputing query outputs, advancing the understanding of provenance guarantees.
Findings
Provenance traces can be formally modeled for nested relational queries.
Trace slicing algorithms effectively extract relevant subtraces for explanation.
Correctness of slicing and differencing techniques is formally proven.
Abstract
Provenance for database queries or scientific workflows is often motivated as providing explanation, increasing understanding of the underlying data sources and processes used to compute the query, and reproducibility, the capability to recompute the results on different inputs, possibly specialized to a part of the output. Many provenance systems claim to provide such capabilities; however, most lack formal definitions or guarantees of these properties, while others provide formal guarantees only for relatively limited classes of changes. Building on recent work on provenance traces and slicing for functional programming languages, we introduce a detailed tracing model of provenance for multiset-valued Nested Relational Calculus, define trace slicing algorithms that extract subtraces needed to explain or recompute specific parts of the output, and define query slicing and differencing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
