Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning
Yongchan Kwon, James Zou

TL;DR
Beta Shapley is a generalized data valuation framework that relaxes traditional axioms, unifies existing methods, and improves the detection of influential data points in machine learning tasks.
Contribution
It introduces Beta Shapley, a new data valuation method that relaxes the efficiency axiom, unifies existing approaches, and offers improved statistical properties and estimation algorithms.
Findings
Beta Shapley outperforms existing methods in detecting mislabeled data.
It effectively identifies influential data points impacting model performance.
Beta Shapley demonstrates superior results across multiple ML tasks.
Abstract
Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
