Shapley Value on Uncertain Data
Zhuofan Jia, Jian Pei

TL;DR
This paper extends the Shapley value framework to probabilistic data, providing methods to estimate the expected contribution and variance of data owners' stochastic samples, with theoretical guarantees and practical algorithms.
Contribution
It introduces a novel probabilistic Shapley value framework, deriving unbiased estimators and developing efficient Monte Carlo algorithms for data valuation under uncertainty.
Findings
Stronger accuracy-efficiency trade-offs achieved by proposed estimators.
Stratified pooled estimator significantly reduces variance.
Methods validated on synthetic and real datasets.
Abstract
The Shapley value provides a principled framework for fairly distributing rewards among participants according to their individual contributions. While prior work has applied this concept to data valuation in machine learning, existing formulations overwhelmingly assume that each participant contributes a fixed, deterministic dataset. In practice, however, data owners often provide samples drawn from underlying probabilistic distributions, introducing stochasticity into their marginal contributions and rendering the Shapley value itself a random variable. This work addresses this gap by proposing a framework for the Shapley value of probabilistic data distributions that quantifies both the expected contribution and the variance of each participant, thereby capturing uncertainty induced by random sampling. We develop theoretical and empirical methodologies for estimating these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Data Quality and Management · Ethics and Social Impacts of AI
