Confidence intervals for the random forest generalization error
Paulo C. Marques F

TL;DR
This paper introduces a method to compute confidence intervals for the random forest generalization error using existing training outputs, avoiding additional data splitting or retraining, and demonstrating good coverage and efficiency.
Contribution
It provides a novel, low-cost approach to derive confidence intervals for random forest errors directly from training outputs, improving upon existing methods.
Findings
Confidence intervals have good coverage in simulations.
Intervals shrink appropriately with larger training samples.
Method avoids data splitting and retraining.
Abstract
We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also give a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLandslides and related hazards · Neural Networks and Applications · Gaussian Processes and Bayesian Inference
