Statistically Valid Variable Importance Assessment through Conditional Permutations
Ahmad Chamma (1, 2, 3), Denis A. Engemann (4), Bertrand, Thirion (1, 2, 3) ((1) Inria, (2) Universite Paris Saclay, (3) CEA, (4), Roche Pharma Research, Early Development, Neuroscience, Rare Diseases,, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Basel, Switzerland)

TL;DR
This paper introduces Conditional Permutation Importance (CPI), a statistically valid, model-agnostic method for variable importance assessment that effectively handles correlated variables and outperforms standard permutation methods in accuracy and reliability.
Contribution
The paper develops CPI, a new importance measure that overcomes limitations of existing permutation methods, with theoretical guarantees and practical effectiveness demonstrated on neural networks and real data.
Findings
CPI provides accurate type-I error control.
CPI outperforms standard permutation importance in benchmarks.
CPI yields more parsimonious variable selection in real-world data.
Abstract
Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that overcomes the limitations of standard permutation importance by providing accurate type-I…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStatistical Methods in Clinical Trials · Machine Learning in Healthcare · Health Systems, Economic Evaluations, Quality of Life
MethodsFLIP
