Data Shapley: Equitable Valuation of Data for Machine Learning
Amirata Ghorbani, James Zou

TL;DR
This paper introduces data Shapley, a principled method for valuing individual data points in machine learning, ensuring fairness and providing insights into data quality, outliers, and data acquisition strategies.
Contribution
It develops a novel data valuation framework using Shapley values for supervised learning, with efficient estimation methods applicable to complex models like neural networks.
Findings
Data Shapley outperforms leave-one-out and leverage scores in data valuation.
Low Shapley value data identify outliers and corruptions effectively.
High Shapley value data guide data acquisition to improve models.
Abstract
As data becomes the fuel driving technological and economic growth, a fundamental challenge is how to quantify the value of data in algorithmic predictions and decisions. For example, in healthcare and consumer markets, it has been suggested that individuals should be compensated for the data that they generate, but it is not clear what is an equitable valuation for individual data. In this work, we develop a principled framework to address data valuation in the context of supervised machine learning. Given a learning algorithm trained on data points to produce a predictor, we propose data Shapley as a metric to quantify the value of each training datum to the predictor performance. Data Shapley value uniquely satisfies several natural properties of equitable data valuation. We develop Monte Carlo and gradient-based methods to efficiently estimate data Shapley values in practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Forecasting Techniques and Applications · Machine Learning in Healthcare
