Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter

TL;DR
This paper introduces Leverage SHAP, an efficient algorithm for estimating Shapley values in machine learning models, providing provable accuracy guarantees with significantly fewer model evaluations than existing methods.
Contribution
Leverage SHAP is a novel, theoretically grounded modification of Kernel SHAP that uses leverage score sampling to achieve accurate estimates with O(n log n) evaluations.
Findings
Leverage SHAP outperforms Kernel SHAP in empirical tests.
Provides non-asymptotic complexity guarantees for Shapley value estimation.
Achieves accurate estimates with fewer model evaluations.
Abstract
Originally introduced in game theory, Shapley values have emerged as a central tool in explainable machine learning, where they are used to attribute model predictions to specific input features. However, computing Shapley values exactly is expensive: for a general model with features, model evaluations are necessary. To address this issue, approximation algorithms are widely used. One of the most popular is the Kernel SHAP algorithm, which is model agnostic and remarkably effective in practice. However, to the best of our knowledge, Kernel SHAP has no strong non-asymptotic complexity guarantees. We address this issue by introducing Leverage SHAP, a light-weight modification of Kernel SHAP that provides provably accurate Shapley value estimates with just model evaluations. Our approach takes advantage of a connection between Shapley value estimation and…
Peer Reviews
Decision·ICLR 2025 Spotlight
Shapley values are a basic and important topic in interpretable AI and beyond, finding wide application in practice. The problem of efficiently estimating them well is a very well-motivated one. This paper makes a very nice and useful contribution to this problem. The key theoretical insight of analyzing the form of the leverage scores is simple but very clever and elegant, and allows them to make use of a very well-studied toolbox in statistics (although there is still technical work to be done
I do not see any major weaknesses. I do think would be helpful for the authors to discuss the limitations of the Leverage SHAP algorithm a bit more (e.g. does it strictly dominate all prior algorithms?), and provide some context on what still remains open in this space (see below for related questions).
- Estimating Shapley scores accurately and efficiently is an important problem in explainable machine learning. The paper provides a theoretically principled approach for this problem. - The approach seems to outperform Kernel SHAP and optimized Kernel SHAP baselines in the experiments.
- The main theoretical result (Theorem 1.1) is somewhat unsatisfactory as it does not directly compare the true and estimated Shapely values. The authors address this via Corollary 4.1, but it has a non-intuitive $\gamma$ term which is can be large and makes the approximation guarantees weaker. Are there conditions under which $\gamma$ is guaranteed to be small? This would better help understand the limitations of current theoretical results. - The experiments could include more baselines like (
This paper is very well written, introduces the context of their work beautifully, and provides both a theoretical and practical contribution to the field.
A weakness is that it might feel niche, but as a non-specialist in interpretable AI, I cannot judge the importance of the Shapley values. If this information is important, then the author's contribution is quite important because it removes some level of heuristic thanks to their theoretical contribution.
Videos
Taxonomy
TopicsFace and Expression Recognition
MethodsShapley Additive Explanations · Lib
