A Central Limit Theorem for the permutation importance measure

Nico F\"oge; Lena Schmid; Marc Ditzhaus; Markus Pauly

arXiv:2412.13020·math.ST·December 18, 2025

A Central Limit Theorem for the permutation importance measure

Nico F\"oge, Lena Schmid, Marc Ditzhaus, Markus Pauly

PDF

Open Access

TL;DR

This paper establishes a Central Limit Theorem for the permutation importance measure in Random Forests, providing a theoretical foundation for understanding its distribution using U-Statistics.

Contribution

It offers the first formal proof of a CLT for RFPIM, expanding theoretical understanding beyond empirical observations.

Findings

01

Proves CLT for RFPIM under specific conditions

02

Uses U-Statistics theory for the proof

03

Includes a simulation study to illustrate results

Abstract

Random Forests have become a widely used tool in machine learning since their introduction in 2001, known for their strong performance in classification and regression tasks. One key feature of Random Forests is the Random Forest Permutation Importance Measure (RFPIM), an internal, non-parametric measure of variable importance. While widely used, theoretical work on RFPIM is sparse, and most research has focused on empirical findings. However, recent progress has been made, such as establishing consistency of the RFPIM, although a mathematical analysis of its asymptotic distribution is still missing. In this paper, we provide a formal proof of a Central Limit Theorem for RFPIM using U-Statistics theory. Our approach deviates from the conventional Random Forest model by assuming a random number of trees and imposing conditions on the regression functions and error terms, which must be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Probability and Risk Models · Statistical Distribution Estimation and Applications