You Say 'What', I Hear 'Where' and 'Why': (Mis-)Interpreting SQL to Derive Fine-Grained Provenance
Tobias M\"uller, Benjamin Dietrich, Torsten Grust

TL;DR
This paper introduces a novel approach to derive fine-grained data provenance in SQL queries by transforming queries into interpreters that reveal data origins and reasons, even for complex SQL features.
Contribution
It presents a non-invasive, compositional SQL rewriting technique that enables detailed provenance analysis for advanced SQL dialects without overloading database systems.
Findings
Provenance can be derived for recursive queries, windowed aggregates, and user-defined functions.
The approach scales to complex queries while preserving data and query structure.
It provides insights into data origins and reasons within SQL outputs.
Abstract
SQL declaratively specifies what the desired output of a query is. This work shows that a non-standard interpretation of the SQL semantics can, instead, disclose where a piece of the output originated in the input and why that piece found its way into the result. We derive such data provenance for very rich SQL dialects (including recursion, windowed aggregates, and user-defined functions) at the fine-grained level of individual table cells. The approach is non-invasive and implemented as a compositional source-level SQL rewrite: an input SQL query is transformed into its own interpreter that wields data dependencies instead of regular values. We deliberately design this transformation to preserve the shape of both data and query, which allows provenance derivation to scale to complex queries without overwhelming the underlying database system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Web Data Mining and Analysis
