Advancing Fact Attribution for Query Answering: Aggregate Queries and Novel Algorithms
Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, Dan Olteanu

TL;DR
This paper presents a practical method for computing input tuple contributions to query results, including aggregate queries, using novel optimizations that significantly improve runtime performance over previous approaches.
Contribution
It introduces the first practical approach for aggregate query attribution using Banzhaf and Shapley values, with two key optimizations for efficiency.
Findings
Achieves up to 1000x faster runtimes than previous methods for non-aggregate queries.
Demonstrates practicality of attribution for aggregate queries on large datasets.
Significantly improves runtime performance with two novel optimization techniques.
Abstract
In this paper, we introduce a novel approach to computing the contribution of input tuples to the result of the query, quantified by the Banzhaf and Shapley values. In contrast to prior algorithmic work that focuses on Select-Project-Join-Union queries, ours is the first practical approach for queries with aggregates. It relies on two novel optimizations that are essential for its practicality and significantly improve the runtime performance already for queries without aggregates. The first optimization exploits the observation that many input tuples have the same contribution to the query result, so it is enough to compute the contribution of one of them. The second optimization uses the gradient of the query lineage to compute the contributions of all tuples with the same complexity as for one of them. Experiments with a million instances over 3 databases show that our approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Semantic Web and Ontologies
