Optimal cross-validation in density estimation with the $L^2$-loss
Alain Celisse

TL;DR
This paper provides a detailed analysis of cross-validation methods for density estimation using the $L^2$-loss, establishing optimality results and practical improvements over traditional approaches.
Contribution
It introduces closed-form expressions for leave-$p$-out CV risk estimators, demonstrating their advantages and proving the optimality of leave-one-out CV for risk estimation.
Findings
Closed-form expressions improve variability and computational efficiency.
Leave-one-out CV is optimal for risk estimation.
Model selection consistency depends on the ratio p/n and the estimator's convergence rate.
Abstract
We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave--out CV procedure (Lpo), where denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon -fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with , is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
