Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J., Tibshirani

TL;DR
This paper analyzes high-dimensional ridgeless least squares interpolation, revealing phenomena like double descent and overparametrization benefits, in models with linear and nonlinear feature transformations, connecting theory with neural network observations.
Contribution
It provides a precise theoretical analysis of ridgeless interpolation in high dimensions, explaining phenomena observed in neural networks and kernel methods.
Findings
Double descent behavior in prediction risk
Benefits of overparametrization in high-dimensional models
Theoretical connection to neural network phenomena
Abstract
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum norm ("ridgeless") interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors are obtained by applying a linear transform to a vector of i.i.d. entries, (with ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, (with , a matrix of i.i.d. entries, and an activation function acting componentwise on ). We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Image and Signal Denoising Methods
