Extrapolated cross-validation for randomized ensembles
Jin-Hong Du, Pratik Patil, Kathryn Roeder, Arun Kumar Kuchibhotla

TL;DR
This paper introduces ECV, an extrapolated cross-validation method for efficiently tuning ensemble and subsample sizes in randomized ensemble methods, providing theoretical guarantees and practical advantages over traditional CV methods.
Contribution
The paper presents a novel risk extrapolation technique for tuning ensemble parameters, with proven consistency and applicability to high-dimensional settings.
Findings
ECV achieves higher accuracy than traditional CV methods.
ECV reduces computational cost compared to sample-split CV.
Theoretical guarantees hold for high-dimensional regimes.
Abstract
Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields -optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Statistical Methods and Inference · Gene expression and cancer classification
