Towards Efficient Data Valuation Based on the Shapley Value
Ruoxi Jia, David Dao, Boxin Wang, Frances Ann Hubis, Nick Hynes,, Nezihe Merve Gurel, Bo Li, Ce Zhang, Dawn Song, Costas Spanos

TL;DR
This paper explores efficient algorithms for approximating the Shapley value to determine data worth, enabling fair profit sharing and breach compensation, with practical demonstrations on benchmark datasets.
Contribution
It introduces new algorithms for fast approximation of the Shapley value in data valuation, addressing computational challenges and demonstrating their effectiveness.
Findings
Algorithms significantly reduce computation time
Effective valuation of training data instances
Applicable to various benchmark datasets
Abstract
"How much is my data worth?" is an increasingly common question posed by organizations and individuals alike. An answer to this question could allow, for instance, fairly distributing profits among multiple data contributors and determining prospective compensation when data breaches happen. In this paper, we study the problem of data valuation by utilizing the Shapley value, a popular notion of value which originated in cooperative game theory. The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value. However, the Shapley value often requires exponential time to compute. To meet this challenge, we propose a repertoire of efficient algorithms for approximating the Shapley value. We also demonstrate the value of each training instance for various benchmark datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGame Theory and Voting Systems · Auction Theory and Applications · Blockchain Technology Applications and Security
