Computing the Shapley Value of Facts in Query Answering
Daniel Deutch, Nave Frost, Benny Kimelfeld, Mika\"el Monet

TL;DR
This paper introduces practical methods for computing the Shapley value in query answering, leveraging probabilistic databases and data provenance, with proven efficiency on real datasets.
Contribution
It presents the first practical algorithms for Shapley value computation in query answering, connecting it to probabilistic query evaluation and using knowledge compilation techniques.
Findings
Polynomial-time algorithm for tractable query classes
Effective inexact heuristic for faster Shapley computation
Demonstrated success on TPC-H and IMDB datasets
Abstract
The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Semantic Web and Ontologies
