Corrected generalized cross-validation for finite ensembles of penalized estimators
Pierre C. Bellec, Jin-Hong Du, Takuya Koriyama, Pratik Patil, Kai Tan

TL;DR
This paper identifies a flaw in the widely-used GCV method for finite ensembles of penalized estimators and proposes a corrected version, CGCV, that is consistent and retains computational efficiency, supported by theoretical analysis.
Contribution
The paper introduces CGCV, a corrected GCV method for finite ensembles, ensuring consistency and computational efficiency, with theoretical validation under Gaussian and general settings.
Findings
GCV is inconsistent for finite ensembles larger than one.
CGCV provides a consistent risk estimator for penalized ensemble methods.
Theoretical analysis confirms CGCV's uniform consistency under various conditions.
Abstract
Generalized cross-validation (GCV) is a widely-used method for estimating the squared out-of-sample prediction risk that employs a scalar degrees of freedom adjustment (in a multiplicative sense) to the squared training error. In this paper, we examine the consistency of GCV for estimating the prediction risk of arbitrary ensembles of penalized least-squares estimators. We show that GCV is inconsistent for any finite ensemble of size greater than one. Towards repairing this shortcoming, we identify a correction that involves an additional scalar correction (in an additive sense) based on degrees of freedom adjusted training errors from each ensemble component. The proposed estimator (termed CGCV) maintains the computational advantages of GCV and requires neither sample splitting, model refitting, or out-of-bag risk estimation. The estimator stems from a finer inspection of the ensemble…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference
