Towards Data Valuation via Asymmetric Data Shapley
Xi Zheng, Xiangyu Chang, Ruoxi Jia, Yong Tan

TL;DR
This paper introduces an asymmetric data Shapley framework that accounts for dataset structures in data valuation, along with an efficient algorithm for its computation, enhancing the accuracy of data contribution assessment in machine learning.
Contribution
It extends the traditional data Shapley value to incorporate dataset structures and dependencies, providing a more accurate and structure-aware data valuation method.
Findings
The asymmetric data Shapley effectively captures dataset structures.
The proposed algorithm computes data Shapley values efficiently.
Framework demonstrates practical applicability across various tasks.
Abstract
As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient -nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbability and Risk Models · Advanced Database Systems and Queries · Insurance, Mortality, Demography, Risk Management
