Approximate Leave-one-out Cross Validation for Regression with $\ell_1$   Regularizers (extended version)

Arnab Auddy; Haolin Zou; Kamiar Rahnama Rad; Arian Maleki

arXiv:2310.17629·math.ST·October 27, 2023·1 cites

Approximate Leave-one-out Cross Validation for Regression with $\ell_1$ Regularizers (extended version)

Arnab Auddy, Haolin Zou, Kamiar Rahnama Rad, Arian Maleki

PDF

Open Access

TL;DR

This paper develops a new theoretical framework for approximate leave-one-out cross validation (ALO) in generalized linear models with non-differentiable regularizers, especially focusing on l1-regularization, and demonstrates its accuracy as the number of features grows.

Contribution

It introduces a novel theory bounding the error between ALO and LO for models with non-differentiable regularizers, extending previous results to a broader class of problems.

Findings

01

|ALO - LO| converges to zero as p increases with fixed n/p and SNR

02

Theoretical bounds relate error to perturbations in active sets and sample size

03

Applicable to l1-regularized problems in high-dimensional settings

Abstract

The out-of-sample error (OO) is the main quantity of interest in risk estimation and model selection. Leave-one-out cross validation (LO) offers a (nearly) distribution-free yet computationally demanding approach to estimate OO. Recent theoretical work showed that approximate leave-one-out cross validation (ALO) is a computationally efficient and statistically reliable estimate of LO (and OO) for generalized linear models with differentiable regularizers. For problems involving non-differentiable regularizers, despite significant empirical evidence, the theoretical understanding of ALO's error remains unknown. In this paper, we present a novel theory for a wide class of problems in the generalized linear model family with non-differentiable regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such as the size of leave-i-out perturbations in active sets, sample size…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Fault Detection and Control Systems · Control Systems and Identification