Shapley Value on Uncertain Data

Zhuofan Jia; Jian Pei

arXiv:2601.14543·cs.GT·January 22, 2026

Shapley Value on Uncertain Data

Zhuofan Jia, Jian Pei

PDF

Open Access

TL;DR

This paper extends the Shapley value framework to probabilistic data, providing methods to estimate the expected contribution and variance of data owners' stochastic samples, with theoretical guarantees and practical algorithms.

Contribution

It introduces a novel probabilistic Shapley value framework, deriving unbiased estimators and developing efficient Monte Carlo algorithms for data valuation under uncertainty.

Findings

01

Stronger accuracy-efficiency trade-offs achieved by proposed estimators.

02

Stratified pooled estimator significantly reduces variance.

03

Methods validated on synthetic and real datasets.

Abstract

The Shapley value provides a principled framework for fairly distributing rewards among participants according to their individual contributions. While prior work has applied this concept to data valuation in machine learning, existing formulations overwhelmingly assume that each participant contributes a fixed, deterministic dataset. In practice, however, data owners often provide samples drawn from underlying probabilistic distributions, introducing stochasticity into their marginal contributions and rendering the Shapley value itself a random variable. This work addresses this gap by proposing a framework for the Shapley value of probabilistic data distributions that quantifies both the expected contribution and the variance of each participant, thereby capturing uncertainty induced by random sampling. We develop theoretical and empirical methodologies for estimating these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Data Quality and Management · Ethics and Social Impacts of AI