2D-Shapley: A Framework for Fragmented Data Valuation

Zhihong Liu; Hoang Anh Just; Xiangyu Chang; Xi Chen; Ruoxi Jia

arXiv:2306.10473·cs.LG·July 28, 2023·1 cites

2D-Shapley: A Framework for Fragmented Data Valuation

Zhihong Liu, Hoang Anh Just, Xiangyu Chang, Xi Chen, Ruoxi Jia

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces 2D-Shapley, a novel framework for valuing fragmented data sources with partial features and samples, enabling better data selection, interpretation, and diagnosis in machine learning models.

Contribution

It proposes a new counterfactual-based method and a theoretical framework for valuing fragmented data sources, addressing a gap in existing data valuation approaches.

Findings

01

Enables selection of useful data fragments.

02

Provides interpretation for sample-wise data values.

03

Facilitates fine-grained data issue diagnosis.

Abstract

Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruoxi-jia-group/2dshapley
noneOfficial

Videos

2D-Shapley: A Framework for Fragmented Data Valuation· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Data Quality and Management · Bayesian Modeling and Causal Inference