Asymptotic Unbiasedness of the Permutation Importance Measure in Random Forest Models
Burim Ramosaj, Markus Pauly

TL;DR
This paper provides theoretical guarantees for the permutation importance measure in Random Forests, showing its asymptotic unbiasedness under certain assumptions, supported by extensive simulations.
Contribution
It establishes the asymptotic unbiasedness of permutation importance in Random Forests, a novel theoretical insight for variable selection in high-dimensional settings.
Findings
Permutation importance is asymptotically unbiased under specific assumptions.
Theoretical guarantees are supported by extensive simulation results.
Permutation importance reliably indicates informative variables in high-dimensional models.
Abstract
Variable selection in sparse regression models is an important task as applications ranging from biomedical research to econometrics have shown. Especially for higher dimensional regression problems, for which the link function between response and covariates cannot be directly detected, the selection of informative variables is challenging. Under these circumstances, the Random Forest method is a helpful tool to predict new outcomes while delivering measures for variable selection. One common approach is the usage of the permutation importance. Due to its intuitive idea and flexible usage, it is important to explore circumstances, for which the permutation importance based on Random Forest correctly indicates informative covariates. Regarding the latter, we deliver theoretical guarantees for the validity of the permutation importance measure under specific assumptions and prove its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods in Clinical Trials · Statistical Methods and Bayesian Inference
