FACT: High-Dimensional Random Forests Inference
Chien-Ming Chi, Yingying Fan, Jinchi Lv

TL;DR
This paper introduces FACT, a bias-resistant hypothesis testing framework for assessing feature importance in high-dimensional random forests, supported by theoretical guarantees and demonstrated through simulations and an economic forecasting case.
Contribution
It proposes a novel self-normalized feature-residual correlation test (FACT) for unbiased feature significance testing in high-dimensional random forests.
Findings
FACT controls type I error effectively.
FACT demonstrates high power in simulations.
Application to economic data shows practical utility.
Abstract
Quantifying the usefulness of individual features in random forests learning can greatly enhance its interpretability. Existing studies have shown that some popularly used feature importance measures for random forests suffer from the bias issue. In addition, there lack comprehensive size and power analyses for most of these existing methods. In this paper, we approach the problem via hypothesis testing, and suggest a framework of the self-normalized feature-residual correlation test (FACT) for evaluating the significance of a given feature in the random forests model with bias-resistance property, where our null hypothesis concerns whether the feature is conditionally independent of the response given all other features. Such an endeavor on random forests inference is empowered by some recent developments on high-dimensional random forests consistency. Under a fairly general…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Probabilistic and Robust Engineering Design · Neural Networks and Applications
MethodsTest
