Accelerated Shapley Value Approximation for Data Evaluation
Lauren Watson, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq, Elahi, Rik Sarkar

TL;DR
This paper introduces a faster method for approximating Shapley values in data valuation by leveraging structural properties of machine learning, significantly reducing computation while maintaining accuracy.
Contribution
It proposes δ-Shapley, a novel approach that uses small data subsets for efficient Shapley value approximation with theoretical guarantees.
Findings
Achieves up to 9.9x speedup in experiments.
Maintains data value ranking accuracy.
More efficient in pre-trained networks.
Abstract
Data valuation has found various applications in machine learning, such as data filtering, efficient learning and incentives for data sharing. The most popular current approach to data valuation is the Shapley value. While popular for its various applications, Shapley value is computationally expensive even to approximate, as it requires repeated iterations of training models on different subsets of data. In this paper we show that the Shapley value of data points can be approximated more efficiently by leveraging the structural properties of machine learning problems. We derive convergence guarantees on the accuracy of the approximate Shapley value for different learning settings including Stochastic Gradient Descent with convex and non-convex loss functions. Our analysis suggests that in fact models trained on small subsets are more important in the context of data valuation. Based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI)
