Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for   Machine Learning

Yongchan Kwon; James Zou

arXiv:2110.14049·cs.LG·January 20, 2022·21 cites

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Yongchan Kwon, James Zou

PDF

Open Access 2 Repos

TL;DR

Beta Shapley is a generalized data valuation framework that relaxes traditional axioms, unifies existing methods, and improves the detection of influential data points in machine learning tasks.

Contribution

It introduces Beta Shapley, a new data valuation method that relaxes the efficiency axiom, unifies existing approaches, and offers improved statistical properties and estimation algorithms.

Findings

01

Beta Shapley outperforms existing methods in detecting mislabeled data.

02

It effectively identifies influential data points impacting model performance.

03

Beta Shapley demonstrates superior results across multiple ML tasks.

Abstract

Data Shapley has recently been proposed as a principled framework to quantify the contribution of individual datum in machine learning. It can effectively identify helpful or harmful data points for a learning algorithm. In this paper, we propose Beta Shapley, which is a substantial generalization of Data Shapley. Beta Shapley arises naturally by relaxing the efficiency axiom of the Shapley value, which is not critical for machine learning settings. Beta Shapley unifies several popular data valuation methods and includes data Shapley as a special case. Moreover, we prove that Beta Shapley has several desirable statistical properties and propose efficient algorithms to estimate it. We demonstrate that Beta Shapley outperforms state-of-the-art data valuation methods on several downstream ML tasks such as: 1) detecting mislabeled training data; 2) learning with subsamples; and 3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference