RandALO: Out-of-sample risk estimation in no time flat
Parth Nobel, Daniel LeJeune, Emmanuel J. Cand\`es

TL;DR
RandALO introduces a fast, consistent, and computationally efficient randomized risk estimator for high-dimensional models, outperforming traditional cross-validation methods in accuracy and speed.
Contribution
The paper presents RandALO, a novel randomized approximate leave-one-out risk estimator that is both consistent in high dimensions and less computationally intensive than K-fold cross-validation.
Findings
RandALO provides accurate risk estimates in high-dimensional settings.
It is significantly faster than traditional K-fold cross-validation.
The method is validated on synthetic and real datasets.
Abstract
Estimating out-of-sample risk for models trained on large high-dimensional datasets is an expensive but essential part of the machine learning process, enabling practitioners to optimally tune hyperparameters. Cross-validation (CV) serves as the de facto standard for risk estimation but poorly trades off high bias (-fold CV) for computational cost (leave-one-out CV). We propose a randomized approximate leave-one-out (RandALO) risk estimator that is not only a consistent estimator of risk in high dimensions but also less computationally expensive than -fold CV. We support our claims with extensive simulations on synthetic and real data and provide a user-friendly Python package implementing RandALO available on PyPI as randalo and at https://github.com/cvxgrp/randalo.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
