The generalization error of random features regression: Precise   asymptotics and double descent curve

Song Mei; Andrea Montanari

arXiv:1908.05355·math.ST·December 14, 2020·248 cites

The generalization error of random features regression: Precise asymptotics and double descent curve

Song Mei, Andrea Montanari

PDF

Open Access

TL;DR

This paper provides a precise asymptotic analysis of the double descent phenomenon in random features regression, revealing how test error behaves in high-dimensional regimes and capturing the full double descent curve.

Contribution

It offers the first analytically tractable model that accurately describes the double descent phenomenon in a simplified setting without ad hoc assumptions.

Findings

01

Test error exhibits double descent behavior in high-dimensional random features regression.

02

Precise asymptotics of test error are derived in the limit of large dimensions.

03

Model captures all features of the double descent curve without misspecification assumptions.

Abstract

Deep learning methods operate in regimes that defy the traditional statistical mindset. Neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise. Despite their huge complexity, the same architectures achieve small generalization error on real data. This phenomenon has been rationalized in terms of a so-called `double descent' curve. As the model complexity increases, the test error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold. The global minimum of the test error is found above the interpolation threshold, often in the extreme overparametrization regime in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRandom Matrices and Applications · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques

MethodsLinear Regression