Model free variable importance for high dimensional data
Naofumi Hama, Masayoshi Mase, Art B. Owen

TL;DR
This paper introduces an efficient, model-free variable importance method called IGCS, which approximates cohort Shapley values in high-dimensional data, enabling practical analysis without access to the prediction function.
Contribution
We develop IGCS, an integrated gradient-based approximation of cohort Shapley, reducing computational cost and extending applicability to binary predictors in high-dimensional settings.
Findings
IGCS closely matches cohort Shapley in relevant regions.
IGCS performs well on high energy physics data.
IGCS outperforms Monte Carlo sampling in chemistry application.
Abstract
A model-agnostic variable importance method can be used with arbitrary prediction functions. Here we present some model-free methods that do not require access to the prediction function. This is useful when that function is proprietary and not available, or just extremely expensive. It is also useful when studying residuals from a model. The cohort Shapley (CS) method is model-free but has exponential cost in the dimension of the input space. A supervised on-manifold Shapley method from Frye et al. (2020) is also model free but requires as input a second black box model that has to be trained for the Shapley value problem. We introduce an integrated gradient (IG) version of cohort Shapley, called IGCS, with cost . We show that over the vast majority of the relevant unit cube that the IGCS value function is close to a multilinear function for which IGCS matches CS.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Model Reduction and Neural Networks
