Sequential Bootstrap for Out-of-Bag Error Estimation: A Simulation-Based Replication and Stability-Oriented Refinement
Cheng Peng

TL;DR
This paper introduces Sequential Bootstrap, a controlled resampling method that standardizes the number of distinct observations per replicate to study its impact on OOB error estimation and variance, providing insights into ensemble methods.
Contribution
It presents Sequential Bootstrap as a tool for analyzing the effect of sample variability on OOB estimators, enhancing understanding of variance sources in bootstrap methods.
Findings
Switching to Sequential Bootstrap does not affect accuracy metrics.
Sequential Bootstrap reduces variance-related measures in a data-dependent manner.
The method offers a reproducible framework for studying bootstrap estimator properties.
Abstract
Bootstrap resampling is the foundation of many ensemble learning methods, and out-of-bag (OOB) error estimation is the most widely used internal measure of generalization performance. In the standard multinomial bootstrap, the number of distinct observations in each resample is random. Although this source of variability exists, it has rarely been studied in isolation to understand how much it affects OOB-based quantities. To address this gap, we investigate Sequential Bootstrap, a resampling method that forces every bootstrap replicate to contain the same number of distinct observations, and treat it as a controlled modification of the classical bootstrap within the OOB framework. We reproduce Breiman's five original OOB experiments on both synthetic and real-world datasets, repeating all analyses across many different random seeds. Our results show that switching from the classical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Explainable Artificial Intelligence (XAI)
