Computing the Shapley Value of Facts in Query Answering

Daniel Deutch; Nave Frost; Benny Kimelfeld; Mika\"el Monet

arXiv:2112.08874·cs.DB·January 4, 2022

Computing the Shapley Value of Facts in Query Answering

Daniel Deutch, Nave Frost, Benny Kimelfeld, Mika\"el Monet

PDF

Open Access 1 Repo

TL;DR

This paper introduces practical methods for computing the Shapley value in query answering, leveraging probabilistic databases and data provenance, with proven efficiency on real datasets.

Contribution

It presents the first practical algorithms for Shapley value computation in query answering, connecting it to probabilistic query evaluation and using knowledge compilation techniques.

Findings

01

Polynomial-time algorithm for tractable query classes

02

Effective inexact heuristic for faster Shapley computation

03

Demonstrated success on TPC-H and IMDB datasets

Abstract

The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

navefr/shapleyfordbfacts
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Data Quality and Management · Semantic Web and Ontologies