Prediction Error Estimation in Random Forests

Ian Krupkin; Johanna Hardin

arXiv:2309.00736·stat.ML·August 9, 2024

Prediction Error Estimation in Random Forests

Ian Krupkin, Johanna Hardin

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various error estimation methods for classification Random Forests, revealing that their error estimates are generally closer to the true error rate than the average prediction error, contrasting previous findings for logistic regression.

Contribution

It extends the theoretical framework for error estimation in Random Forests and demonstrates the accuracy of different estimation strategies across multiple methods.

Findings

01

Random Forest error estimates are closer to the true error rate.

02

This result contrasts with previous findings for logistic regression.

03

The accuracy holds across various error estimation strategies.

Abstract

In this paper, error estimates of classification Random Forests are quantitatively assessed. Based on the initial theoretical framework built by Bates et al. (2023), the true error rate and expected error rate are theoretically and empirically investigated in the context of a variety of error estimation methods common to Random Forests. We show that in the classification case, Random Forests' estimates of prediction error is closer on average to the true error rate instead of the average prediction error. This is opposite the findings of Bates et al. (2023) which are given for logistic regression. We further show that our result holds across different error estimation strategies such as cross-validation, bagging, and data splitting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

iankrupkin/Prediction-Error-Estimation-in-Random-Forests
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Hydrological Forecasting Using AI · Neural Networks and Applications