RandALO: Out-of-sample risk estimation in no time flat

Parth Nobel; Daniel LeJeune; Emmanuel J. Cand\`es

arXiv:2409.09781·math.ST·April 28, 2025

RandALO: Out-of-sample risk estimation in no time flat

Parth Nobel, Daniel LeJeune, Emmanuel J. Cand\`es

PDF

Open Access 1 Repo

TL;DR

RandALO introduces a fast, consistent, and computationally efficient randomized risk estimator for high-dimensional models, outperforming traditional cross-validation methods in accuracy and speed.

Contribution

The paper presents RandALO, a novel randomized approximate leave-one-out risk estimator that is both consistent in high dimensions and less computationally intensive than K-fold cross-validation.

Findings

01

RandALO provides accurate risk estimates in high-dimensional settings.

02

It is significantly faster than traditional K-fold cross-validation.

03

The method is validated on synthetic and real datasets.

Abstract

Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias ( $K$ -fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than $K$ -fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvxgrp/randalo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference