A Modern Theory of Cross-Validation through the Lens of Stability

Jing Lei

arXiv:2505.23592·math.ST·October 30, 2025

A Modern Theory of Cross-Validation through the Lens of Stability

Jing Lei

PDF

Open Access

TL;DR

This paper explores the theoretical foundations of cross-validation using stability concepts, providing new insights and tools for model selection, uncertainty quantification, and conformal prediction in complex data analysis.

Contribution

It offers a comprehensive theoretical analysis of cross-validation through stability, advancing understanding of its role in uncertainty quantification and model selection.

Findings

01

Theoretical results on CV for estimating generalization error

02

Stability-based bounds for model selection accuracy

03

New methods for uncertainty quantification in CV-based risk estimates

Abstract

Modern data analysis and statistical learning are marked by complex data structures and black-box algorithms. Data complexity stems from technologies such as imaging, remote sensing, wearable devices, and genomic sequencing. At the same time, black-box models, especially deep neural networks, have achieved impressive results. This combination raises new challenges for uncertainty quantification and statistical inference, which we refer to as ``black-box inference.'' Black-box inference is difficult due to the lack of traditional modeling assumptions and the opaque behavior of modern estimators. These factors make it hard to characterize the distribution of estimation errors. A popular solution is post-hoc randomization, which, under mild assumptions such as exchangeability, can yield valid uncertainty quantification. Such methods range from classical techniques like permutation tests,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning