Variance reduced Shapley value estimation for trustworthy data valuation
Mengmeng Wu, Ruoxi Jia, Changle Lin, Wei Huang, Xiangyu Chang

TL;DR
This paper introduces VRDS, a stratified sampling method that significantly reduces variance in Shapley value estimation for data valuation, enhancing reliability in data trading and decision-making.
Contribution
The paper develops a novel stratified sampling approach for Shapley value estimation, providing theoretical analysis and demonstrating improved accuracy over traditional permutation sampling.
Findings
VRDS reduces estimation variance compared to permutation sampling.
Theoretical analysis of stratification and sample complexity is provided.
VRDS performs well across various datasets and data removal scenarios.
Abstract
Data valuation, especially quantifying data value in algorithmic prediction and decision-making, is a fundamental problem in data trading scenarios. The most widely used method is to define the data Shapley and approximate it by means of the permutation sampling algorithm. To make up for the large estimation variance of the permutation sampling that hinders the development of the data marketplace, we propose a more robust data valuation method using stratified sampling, named variance reduced data Shapley (VRDS for short). We theoretically show how to stratify, how many samples are taken at each stratum, and the sample complexity analysis of VRDS. Finally, the effectiveness of VRDS is illustrated in different types of datasets and data removal applications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Probability and Risk Models · Data Quality and Management
