Stability of Random Forests and Coverage of Random-Forest Prediction Intervals
Yan Wang, Huaiqing Wu, Dan Nettleton

TL;DR
This paper proves the stability of random forests under certain conditions and demonstrates that they can reliably produce accurate prediction intervals with justified coverage probabilities.
Contribution
It establishes the stability of random forests under mild conditions and derives bounds for the coverage of prediction intervals, extending understanding of their reliability.
Findings
Stability of random forests holds under mild tail conditions.
Prediction intervals from random forests have quantifiable coverage guarantees.
Empirical results suggest stability persists even beyond theoretical assumptions.
Abstract
We establish stability of random forests under the mild condition that the squared response () does not have a heavy tail. In particular, our analysis holds for the practical version of random forests that is implemented in popular packages like \texttt{randomForest} in \texttt{R}. Empirical results show that stability may persist even beyond our assumption and hold for heavy-tailed . Using the stability property, we prove a non-asymptotic lower bound for the coverage probability of prediction intervals constructed from the out-of-bag error of random forests. With another mild condition that is typically satisfied when is continuous, we also establish a complementary upper bound, which can be similarly established for the jackknife prediction interval constructed from an arbitrary stable algorithm. We also discuss the asymptotic coverage probability under assumptions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
