Error bounds in estimating the out-of-sample prediction error using   leave-one-out cross validation in high-dimensions

Kamiar Rahnama Rad; Wenda Zhou; Arian Maleki

arXiv:2003.01770·stat.ML·March 5, 2020·6 cites

Error bounds in estimating the out-of-sample prediction error using leave-one-out cross validation in high-dimensions

Kamiar Rahnama Rad, Wenda Zhou, Arian Maleki

PDF

Open Access

TL;DR

This paper provides theoretical bounds on the accuracy of leave-one-out cross validation for estimating out-of-sample prediction error in high-dimensional penalized regression, showing the bounds diminish as sample and feature sizes grow.

Contribution

It offers the first finite-sample theoretical bounds for LO error in high-dimensional generalized linear models without sparsity assumptions.

Findings

01

LO error bounds decrease as n,p increase

02

LO remains accurate even when p exceeds n

03

The theory connects to scalable approximate LO methods

Abstract

We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n / p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized regression in the generalized linear family. With minor assumptions about the data generating process, and without any sparsity assumptions on the regression coefficients, our theoretical analysis obtains finite sample upper bounds on the expected squared error of LO in estimating the out-of-sample error. Our bounds show that the error goes to zero as $n, p \to \infty$ , even when the dimension $p$ of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Sparse and Compressive Sensing Techniques · Probabilistic and Robust Engineering Design