Relation-Stratified Sampling for Shapley Values Estimation in Relational Databases
Amirhossein Alizad, Mostafa Milani

TL;DR
This paper introduces Relation-Stratified Sampling (RSS) and its adaptive variant ARSS for efficient estimation of Shapley values in relational databases, leveraging schema structure to improve accuracy and reduce variance.
Contribution
It proposes a join-aware stratification method and an adaptive sampling algorithm for more accurate and efficient tuple attribution in relational query analysis.
Findings
RSS and ARSS outperform classical Monte Carlo and size-based stratified sampling.
Relation-aware stratification and adaptive allocation provide complementary improvements.
ARSS is an effective, anytime estimator for database-centric Shapley attribution.
Abstract
Shapley-like values, including the Shapley and Banzhaf values, provide a principled way to quantify how individual tuples contribute to a query result. Their exact computation, however, is intractable because it requires aggregating marginal contributions over exponentially many permutations or subsets. While sampling-based estimators have been studied in cooperative game theory, their direct use for relational query answering remains underexplored and often ignores the structure of schemas and joins. We study tuple-level attribution for relational queries through sampling and introduce Relation-Stratified Sampling (RSS). Instead of stratifying coalitions only by size, RSS partitions the sample space by a relation-wise count vector that records how many tuples are drawn from each relation. This join-aware stratification concentrates samples on structurally valid and informative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Data Quality and Management
