An a Priori Exponential Tail Bound for k-Folds Cross-Validation
Karim Abou-Moustafa, Csaba Szepesvari

TL;DR
This paper introduces an exponential tail bound for k-fold cross-validation estimates, linking their concentration to the stability of the learning rule and the number of folds, which impacts practical risk estimation.
Contribution
It derives a novel exponential tail inequality for functions of independent variables and applies it to analyze k-fold cross-validation stability and concentration.
Findings
The tail bound depends on the stability of the learning rule.
Concentration of KFCV estimate is influenced by stability and number of folds.
Implications for reliable empirical risk estimation in practice.
Abstract
We consider a priori generalization bounds developed in terms of cross-validation estimates and the stability of learners. In particular, we first derive an exponential Efron-Stein type tail inequality for the concentration of a general function of n independent random variables. Next, under some reasonable notion of stability, we use this exponential tail bound to analyze the concentration of the k-fold cross-validation (KFCV) estimate around the true risk of a hypothesis generated by a general learning rule. While the accumulated literature has often attributed this concentration to the bias and variance of the estimator, our bound attributes this concentration to the stability of the learning rule and the number of folds k. This insight raises valid concerns related to the practical use of KFCV and suggests research directions to obtain reliable empirical estimates of the actual risk.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · VLSI and Analog Circuit Testing · Face and Expression Recognition
