Asymptotic Normality for Multivariate Random Forest Estimators
Kevin Li

TL;DR
This paper extends the asymptotic normality results of random forest estimators to the multivariate case, showing joint normality of estimates at multiple points with a diagonal covariance matrix, and explores stability conditions for splits.
Contribution
It introduces the multivariate asymptotic normality of random forest estimates and analyzes the covariance structure, including conditions for independence between estimates at different points.
Findings
The vector of estimates at multiple points is jointly normal with a diagonal covariance matrix.
The off-diagonal covariance terms are bounded by the probability of points sharing the same partition.
Numerical simulations confirm the covariance bounds and coverage rates of confidence intervals.
Abstract
Regression trees and random forests are popular and effective non-parametric estimators in practical applications. A recent paper by Athey and Wager shows that the random forest estimate at any point is asymptotically Gaussian; in this paper, we extend this result to the multivariate case and show that the vector of estimates at multiple points is jointly normal. Specifically, the covariance matrix of the limiting normal distribution is diagonal, so that the estimates at any two points are independent in sufficiently deep trees. Moreover, the off-diagonal term is bounded by quantities capturing how likely two points belong to the same partition of the resulting tree. Our results relies on certain a certain stability property when constructing splits, and we give examples of splitting rules for which this assumption is and is not satisfied. We test our proposed covariance bound and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Probabilistic and Robust Engineering Design
