Uncertainty Quantification of Data Shapley via Statistical Inference
Mengmeng Wu, Zhihong Liu, Xiang Li, Ruoxi Jia, Xiangyu Chang

TL;DR
This paper enhances Data Shapley for real-world, evolving datasets by quantifying uncertainty through statistical inference, enabling confidence intervals and practical decision-making in data valuation.
Contribution
It establishes a connection between Data Shapley and U-statistics, addressing its limitation of fixed datasets by quantifying uncertainty with changing data distributions.
Findings
Proves asymptotic normality of Data Shapley estimates.
Develops algorithms for uncertainty estimation.
Demonstrates practical application in data trading scenarios.
Abstract
As data plays an increasingly pivotal role in decision-making, the emergence of data markets underscores the growing importance of data valuation. Within the machine learning landscape, Data Shapley stands out as a widely embraced method for data valuation. However, a limitation of Data Shapley is its assumption of a fixed dataset, contrasting with the dynamic nature of real-world applications where data constantly evolves and expands. This paper establishes the relationship between Data Shapley and infinite-order U-statistics and addresses this limitation by quantifying the uncertainty of Data Shapley with changes in data distribution from the perspective of U-statistics. We make statistical inferences on data valuation to obtain confidence intervals for the estimations. We construct two different algorithms to estimate this uncertainty and provide recommendations for their applicable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification
