Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions
Kamiar Rahnama Rad, Wenda Zhou, Arian Maleki

TL;DR
This paper provides theoretical bounds on the accuracy of leave-one-out cross validation for estimating out-of-sample prediction error in high-dimensional penalized regression, showing the bounds diminish as sample and feature sizes grow.
Contribution
It offers the first finite-sample theoretical bounds for LO error in high-dimensional generalized linear models without sparsity assumptions.
Findings
LO error bounds decrease as n,p increase
LO remains accurate even when p exceeds n
The theory connects to scalable approximate LO methods
Abstract
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size and number of features are large, and can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as , even when the dimension of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Probabilistic and Robust Engineering Design
