CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy   Machine Learning

Huaiguang Cai

arXiv:2406.11730·cs.GT·January 23, 2025·1 cites

CHG Shapley: Efficient Data Valuation and Selection towards Trustworthy Machine Learning

Huaiguang Cai

PDF

Open Access 2 Repos

TL;DR

This paper introduces CHG Shapley, an efficient method for data valuation and selection that approximates data contribution to model performance with reduced computational cost, enhancing trustworthy machine learning.

Contribution

The paper proposes the CHG utility function and derives a closed-form Shapley value, enabling fast data valuation and selection without extensive retraining.

Findings

01

Effective identification of high-value data points

02

Robustness in noisy and imbalanced datasets

03

Quadratic improvement in computational efficiency

Abstract

Understanding the decision-making process of machine learning models is crucial for ensuring trustworthy machine learning. Data Shapley, a landmark study on data valuation, advances this understanding by assessing the contribution of each datum to model performance. However, the resource-intensive and time-consuming nature of multiple model retraining poses challenges for applying Data Shapley to large datasets. To address this, we propose the CHG (compound of Hardness and Gradient) utility function, which approximates the utility of each data subset on model performance in every training epoch. By deriving the closed-form Shapley value for each data point using the CHG utility function, we reduce the computational complexity to that of a single model retraining, achieving a quadratic improvement over existing marginal contribution-based methods. We further leverage CHG Shapley for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI)