Approximate Cross-Validation with Low-Rank Data in High Dimensions

William T. Stephenson; Madeleine Udell; Tamara Broderick

arXiv:2008.10547·stat.ML·November 3, 2022

Approximate Cross-Validation with Low-Rank Data in High Dimensions

William T. Stephenson, Madeleine Udell, Tamara Broderick

PDF

Open Access 1 Video

TL;DR

This paper introduces a fast and accurate approximate cross-validation method tailored for high-dimensional, low-rank data, overcoming limitations of existing ACV techniques by leveraging low-rank Hessian approximations.

Contribution

The authors develop a novel ACV algorithm that exploits low-rank Hessian structure, providing theoretical guarantees and improved speed and accuracy in high-dimensional settings.

Findings

01

The new method is faster and more accurate than existing ACV approaches.

02

Error in the proposed method depends on data rank, not full dimension.

03

Theoretical bounds on approximation error are validated on real and simulated data.

Abstract

Many recent advances in machine learning are driven by a challenging trifecta: large data size $N$ ; high dimensions; and expensive algorithms. In this setting, cross-validation (CV) serves as an important tool for model assessment. Recent advances in approximate cross validation (ACV) provide accurate approximations to CV with only a single model fit, avoiding traditional CV's requirement for repeated runs of expensive algorithms. Unfortunately, these ACV methods can lose both speed and accuracy in high dimensions -- unless sparsity structure is present in the data. Fortunately, there is an alternative type of simplifying structure that is present in most data: approximate low rank (ALR). Guided by this observation, we develop a new algorithm for ACV that is fast and accurate in the presence of ALR data. Our first key insight is that the Hessian matrix -- whose inverse forms the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Approximate Cross-Validation with Low-Rank Data in High Dimensions· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques