Approximate Cross-Validation in High Dimensions with Guarantees
William T. Stephenson, Tamara Broderick

TL;DR
This paper investigates the challenges of approximating leave-one-out cross-validation in high-dimensional settings, showing that most existing methods fail but one can succeed under sparsity assumptions with theoretical guarantees.
Contribution
The paper provides a systematic evaluation of LOOCV approximation methods in high dimensions and introduces a new approach that performs well when the parameter is sparse, with proven theoretical guarantees.
Findings
Most approximation methods perform poorly in high dimensions.
A new approximation method works well under sparsity assumptions.
Theoretical analysis shows the method's error depends on support size, not full dimension.
Abstract
Leave-one-out cross-validation (LOOCV) can be particularly accurate among cross-validation (CV) variants for machine learning assessment tasks -- e.g., assessing methods' error or variability. But it is expensive to re-fit a model times for a dataset of size . Previous work has shown that approximations to LOOCV can be both fast and accurate -- when the unknown parameter is of small, fixed dimension. But these approximations incur a running time roughly cubic in dimension -- and we show that, besides computational issues, their accuracy dramatically deteriorates in high dimensions. Authors have suggested many potential and seemingly intuitive solutions, but these methods have not yet been systematically evaluated or compared. We find that all but one perform so poorly as to be unusable for approximating LOOCV. Crucially, though, we are able to show, both empirically and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Data Classification · Machine Learning and Algorithms
