Local asymptotics of cross-validation in least-squares density estimation
Guillaume Maillard (CELESTE, LM-Orsay)

TL;DR
This paper provides a detailed asymptotic analysis of cross-validation methods in density estimation, revealing their behavior near the optimal model and offering insights into their convergence rates.
Contribution
It introduces a novel asymptotic framework for understanding how cross-validation risk varies with models in density estimation, focusing on the neighborhood of the optimal model.
Findings
CV risk behaves like a sum of a convex function and a symmetrized Brownian motion.
Simple validation and V-fold CV have similar asymptotic behaviors.
The framework helps in understanding the convergence rates of CV methods in model selection.
Abstract
In model selection, several types of cross-validation are commonly used and many variants have been introduced. While consistency of some of these methods has been proven, their rate of convergence to the oracle is generally still unknown. Until now, an asymptotic analysis of crossvalidation able to answer this question has been lacking. Existing results focus on the ''pointwise'' estimation of the risk of a single estimator, whereas analysing model selection requires understanding how the CV risk varies with the model. In this article, we investigate the asymptotics of the CV risk in the neighbourhood of the optimal model, for trigonometric series estimators in density estimation. Asymptotically, simple validation and ''incomplete'' V --fold CV behave like the sum of a convex function fn and a symmetrized Brownian changed in time W gn/V. We argue that this is the right asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Statistical and numerical algorithms
