Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Trevor Hastie; Andrea Montanari; Saharon Rosset; Ryan J.; Tibshirani

arXiv:1903.08560·math.ST·September 12, 2022·70 cites

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J., Tibshirani

PDF

Open Access

TL;DR

This paper analyzes high-dimensional ridgeless least squares interpolation, revealing phenomena like double descent and overparametrization benefits, in models with linear and nonlinear feature transformations, connecting theory with neural network observations.

Contribution

It provides a precise theoretical analysis of ridgeless interpolation in high dimensions, explaining phenomena observed in neural networks and kernel methods.

Findings

01

Double descent behavior in prediction risk

02

Benefits of overparametrization in high-dimensional models

03

Theoretical connection to neural network phenomena

Abstract

Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum $ℓ_{2}$ norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors $x_{i} \in R^{p}$ are obtained by applying a linear transform to a vector of i.i.d. entries, $x_{i} = Σ^{1/2} z_{i}$ (with $z_{i} \in R^{p}$ ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, $x_{i} = φ (W z_{i})$ (with $z_{i} \in R^{d}$ , $W \in R^{p \times d}$ a matrix of i.i.d. entries, and $φ$ an activation function acting componentwise on $W z_{i}$ ). We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Image and Signal Denoising Methods