Kernel Banzhaf: A Fast and Robust Estimator for Banzhaf Values
Yurong Liu, R. Teal Witter, Flip Korn, Tarfah Alrashed, Dimitris, Paparas, Christopher Musco, Juliana Freire

TL;DR
Kernel Banzhaf introduces a novel regression-based estimator for Banzhaf values, offering significant improvements in accuracy, efficiency, and robustness over traditional Monte Carlo methods, with strong theoretical guarantees.
Contribution
This work presents the first regression-based estimator for Banzhaf values, inspired by Kernel SHAP, enhancing computation speed and accuracy for feature importance in machine learning.
Findings
Outperforms Monte Carlo estimators in accuracy and efficiency
Demonstrates robustness to noise and better feature ranking recovery
Provides theoretical guarantees on estimator performance
Abstract
Banzhaf values provide a popular, interpretable alternative to the widely-used Shapley values for quantifying the importance of features in machine learning models. Like Shapley values, computing Banzhaf values exactly requires time exponential in the number of features, necessitating the use of efficient estimators. Existing estimators, however, are limited to Monte Carlo sampling methods. In this work, we introduce Kernel Banzhaf, the first regression-based estimator for Banzhaf values. Our approach leverages a novel regression formulation, whose exact solution corresponds to the exact Banzhaf values. Inspired by the success of Kernel SHAP for Shapley values, Kernel Banzhaf efficiently solves a sampled instance of this regression problem. Through empirical evaluations across eight datasets, we find that Kernel Banzhaf significantly outperforms existing Monte Carlo methods in terms of…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper is well-organized and clearly explains theoretical results and algorithms. In particular, I like that the authors kept the main paper simple while postponing the heavy theories and additional experiments and their analysis to the appendices. 2. Kernel Banzhaf addresses a gap in the computation of Banzhaf values for arbitrary set functions, an area with limited prior research compared to Shapley values. 3. The algorithm has solid theoretical support, as demonstrated by Theorem 3.2,
1. While the paper introduces a practical and efficient method for estimating Banzhaf values, much of its foundation relies on adapting existing techniques developed for Shapley values and generic regression problems. 2. Kernel Banzhaf demonstrates accuracy in Banzhaf value estimation, yet its broader implications for data valuation and generative AI tasks have not been explored. In particular, the authors consider that being inapplicable to generative AI is a limitation of MSR. 3. Robustness
1. Few algorithms have been proposed to compute Banzhaf values for arbitrary set functions. This paper addresses this gap by introducing an algorithm that overcomes this limitation, representing a significant improvement. It also experimentally evaluates the estimator in relation to the true Banzhaf values,rather than relying just on convergence metrics. 2. Theorem 3.2 states that the Banzhaf values are the solution to the linear regression problem defined by matrix A and vector b. Theorem 3.3
1.While the theoretical underpinnings are well-developed, the paper may not provide a comprehensive assessment of the computational efficiency and practicality of the proposed method in real-world applications. Like the computational complexity analysis or empirical time/memory cost. 2.The study demonstrates the robustness of the Kernel Banzhaf algorithm primarily through relevant experiments. Figure 4 shows the horizontal line representing Kernel Banzhaf, which remains unchanged as noise level
A key strength of this research is the simplicity of the proposed estimator for the Banzhaf value. The method involves simply sampling subsets and solving a least squares problem, making the computation highly straightforward. Additionally, the theoretical complexity of the sampling process is studied. While an exact calculation requires all the $2^n$ subsets, the proposed approach reduces this to approximately $O(n \log n / \delta)$. This ease of implementation, along with the theoretical guara
There are no obvious weaknesses I found in this paper. If I have to mention a potential drawback, it might be that the Banzhaf value is less well-known compared to the Shapley value. However, as the authors discuss in Appendix H, the Banzhaf value can serve as a viable alternative to the Shapley value, and it would be ideal to see it become more widely studied alongside the Shapley value in the future.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLand Use and Ecosystem Services · Land Rights and Reforms · Korean Urban and Social Studies
