Confidence intervals for the random forest generalization error

Paulo C. Marques F

arXiv:2112.06101·stat.ML·March 14, 2022

Confidence intervals for the random forest generalization error

Paulo C. Marques F

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to compute confidence intervals for the random forest generalization error using existing training outputs, avoiding additional data splitting or retraining, and demonstrating good coverage and efficiency.

Contribution

It provides a novel, low-cost approach to derive confidence intervals for random forest errors directly from training outputs, improving upon existing methods.

Findings

01

Confidence intervals have good coverage in simulations.

02

Intervals shrink appropriately with larger training samples.

03

Method avoids data splitting and retraining.

Abstract

We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also give a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paulocmarquesf/rangerror
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLandslides and related hazards · Neural Networks and Applications · Gaussian Processes and Bayesian Inference