Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Garud Iyengar; Henry Lam; Tianyu Wang

arXiv:2407.02754·math.ST·August 22, 2024·1 cites

Is Cross-Validation the Gold Standard to Evaluate Model Performance?

Garud Iyengar, Henry Lam, Tianyu Wang

PDF

Open Access

TL;DR

This paper critically examines the statistical advantages of cross-validation over simple plug-in methods for model evaluation, revealing that CV often does not outperform plug-in in bias and coverage, especially in nonparametric settings.

Contribution

The paper provides a theoretical comparison between cross-validation and plug-in methods, showing CV's limitations and introducing a novel higher-order Taylor analysis for evaluation.

Findings

01

K-fold CV does not outperform plug-in in bias and coverage.

02

Leave-one-out CV offers negligible bias improvement over plug-in.

03

Numerical results confirm plug-in's competitive performance across examples.

Abstract

Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In this paper we fill in this gap and show that in fact, for a wide spectrum of models, CV does not statistically outperform the simple "plug-in" approach where one reuses training data for testing evaluation. Specifically, in terms of both the asymptotic bias and coverage accuracy of the associated interval for out-of-sample evaluation, $K$ -fold CV provably cannot outperform plug-in regardless of the rate at which the parametric or nonparametric models converge. Leave-one-out CV can have a smaller bias as compared to plug-in; however, this bias improvement is negligible compared to the variability of the evaluation, and in some important cases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEvaluation and Performance Assessment