Accelerated Shapley Value Approximation for Data Evaluation

Lauren Watson; Zeno Kujawa; Rayna Andreeva; Hao-Tsung Yang; Tariq; Elahi; Rik Sarkar

arXiv:2311.05346·cs.LG·November 10, 2023·1 cites

Accelerated Shapley Value Approximation for Data Evaluation

Lauren Watson, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq, Elahi, Rik Sarkar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a faster method for approximating Shapley values in data valuation by leveraging structural properties of machine learning, significantly reducing computation while maintaining accuracy.

Contribution

It proposes δ-Shapley, a novel approach that uses small data subsets for efficient Shapley value approximation with theoretical guarantees.

Findings

01

Achieves up to 9.9x speedup in experiments.

02

Maintains data value ranking accuracy.

03

More efficient in pre-trained networks.

Abstract

Data valuation has found various applications in machine learning, such as data filtering, efficient learning and incentives for data sharing. The most popular current approach to data valuation is the Shapley value. While popular for its various applications, Shapley value is computationally expensive even to approximate, as it requires repeated iterations of training models on different subsets of data. In this paper we show that the Shapley value of data points can be approximated more efficiently by leveraging the structural properties of machine learning problems. We derive convergence guarantees on the accuracy of the approximate Shapley value for different learning settings including Stochastic Gradient Descent with convex and non-convex loss functions. Our analysis suggests that in fact models trained on small subsets are more important in the context of data valuation. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aai-institute/pyDVL
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)