Provably Accurate Shapley Value Estimation via Leverage Score Sampling

Christopher Musco; R. Teal Witter

arXiv:2410.01917·cs.LG·March 11, 2025·3 cites

Provably Accurate Shapley Value Estimation via Leverage Score Sampling

Christopher Musco, R. Teal Witter

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces Leverage SHAP, an efficient algorithm for estimating Shapley values in machine learning models, providing provable accuracy guarantees with significantly fewer model evaluations than existing methods.

Contribution

Leverage SHAP is a novel, theoretically grounded modification of Kernel SHAP that uses leverage score sampling to achieve accurate estimates with O(n log n) evaluations.

Findings

01

Leverage SHAP outperforms Kernel SHAP in empirical tests.

02

Provides non-asymptotic complexity guarantees for Shapley value estimation.

03

Achieves accurate estimates with fewer model evaluations.

Abstract

Originally introduced in game theory, Shapley values have emerged as a central tool in explainable machine learning, where they are used to attribute model predictions to specific input features. However, computing Shapley values exactly is expensive: for a general model with $n$ features, $O (2^{n})$ model evaluations are necessary. To address this issue, approximation algorithms are widely used. One of the most popular is the Kernel SHAP algorithm, which is model agnostic and remarkably effective in practice. However, to the best of our knowledge, Kernel SHAP has no strong non-asymptotic complexity guarantees. We address this issue by introducing Leverage SHAP, a light-weight modification of Kernel SHAP that provides provably accurate Shapley value estimates with just $O (n lo g n)$ model evaluations. Our approach takes advantage of a connection between Shapley value estimation and…

Peer Reviews

Decision·ICLR 2025 Spotlight

Reviewer 01Rating 8Confidence 4

Strengths

Shapley values are a basic and important topic in interpretable AI and beyond, finding wide application in practice. The problem of efficiently estimating them well is a very well-motivated one. This paper makes a very nice and useful contribution to this problem. The key theoretical insight of analyzing the form of the leverage scores is simple but very clever and elegant, and allows them to make use of a very well-studied toolbox in statistics (although there is still technical work to be done

Weaknesses

I do not see any major weaknesses. I do think would be helpful for the authors to discuss the limitations of the Leverage SHAP algorithm a bit more (e.g. does it strictly dominate all prior algorithms?), and provide some context on what still remains open in this space (see below for related questions).

Reviewer 02Rating 6Confidence 2

Strengths

- Estimating Shapley scores accurately and efficiently is an important problem in explainable machine learning. The paper provides a theoretically principled approach for this problem. - The approach seems to outperform Kernel SHAP and optimized Kernel SHAP baselines in the experiments.

Weaknesses

- The main theoretical result (Theorem 1.1) is somewhat unsatisfactory as it does not directly compare the true and estimated Shapely values. The authors address this via Corollary 4.1, but it has a non-intuitive $\gamma$ term which is can be large and makes the approximation guarantees weaker. Are there conditions under which $\gamma$ is guaranteed to be small? This would better help understand the limitations of current theoretical results. - The experiments could include more baselines like (

Reviewer 03Rating 8Confidence 3

Strengths

This paper is very well written, introduces the context of their work beautifully, and provides both a theoretical and practical contribution to the field.

Weaknesses

A weakness is that it might feel niche, but as a non-specialist in interpretable AI, I cannot judge the importance of the Shapley values. If this information is important, then the author's contribution is quite important because it removes some level of heuristic thanks to their theoretical contribution.

Videos

Provably Accurate Shapley Value Estimation via Leverage Score Sampling· slideslive

Taxonomy

TopicsFace and Expression Recognition

MethodsShapley Additive Explanations · Lib