V-fold cross-validation improved: V-fold penalization
Sylvain Arlot (LM-Orsay, INRIA Futurs)

TL;DR
This paper introduces V-fold penalization, an improved model selection method over V-fold cross-validation, especially effective in heteroscedastic regression, with theoretical guarantees and empirical validation showing enhanced performance.
Contribution
It proposes V-fold penalization, a new model selection procedure that outperforms V-fold cross-validation in non-asymptotic settings and adapts to heteroscedastic noise.
Findings
V-fold penalization satisfies a non-asymptotic oracle inequality.
It adapts to the smoothness of the regression function.
Simulation results show significant improvement over V-fold cross-validation.
Abstract
We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called ``V-fold penalization'' (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
