Approximate Cross-Validation in High Dimensions with Guarantees

William T. Stephenson; Tamara Broderick

arXiv:1905.13657·stat.ML·June 24, 2020·5 cites

Approximate Cross-Validation in High Dimensions with Guarantees

William T. Stephenson, Tamara Broderick

PDF

Open Access 1 Repo

TL;DR

This paper investigates the challenges of approximating leave-one-out cross-validation in high-dimensional settings, showing that most existing methods fail but one can succeed under sparsity assumptions with theoretical guarantees.

Contribution

The paper provides a systematic evaluation of LOOCV approximation methods in high dimensions and introduces a new approach that performs well when the parameter is sparse, with proven theoretical guarantees.

Findings

01

Most approximation methods perform poorly in high dimensions.

02

A new approximation method works well under sparsity assumptions.

03

Theoretical analysis shows the method's error depends on support size, not full dimension.

Abstract

Leave-one-out cross-validation (LOOCV) can be particularly accurate among cross-validation (CV) variants for machine learning assessment tasks -- e.g., assessing methods' error or variability. But it is expensive to re-fit a model $N$ times for a dataset of size $N$ . Previous work has shown that approximations to LOOCV can be both fast and accurate -- when the unknown parameter is of small, fixed dimension. But these approximations incur a running time roughly cubic in dimension -- and we show that, besides computational issues, their accuracy dramatically deteriorates in high dimensions. Authors have suggested many potential and seemingly intuitive solutions, but these methods have not yet been systematically evaluated or compared. We find that all but one perform so poorly as to be unusable for approximating LOOCV. Crucially, though, we are able to show, both empirically and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://bitbucket.org/wtstephe/sparse_appx_cv
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Machine Learning and Algorithms