The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training
Andrea Montanari, Yiqiao Zhong

TL;DR
This paper analyzes the interpolation and generalization properties of overparametrized two-layer neural networks in the neural tangent regime, revealing how they can interpolate data and generalize well through kernel ridge regression approximations.
Contribution
It provides a theoretical characterization of the eigenstructure of the NT kernel and the generalization error in the overparametrized regime, connecting neural networks to polynomial ridge regression.
Findings
Eigenstructure of NT kernel is well-characterized in overparametrized regime
Minimum eigenvalue of NT kernel remains bounded away from zero when Nd≫n
Test error approximates kernel ridge regression error in the overparametrized regime
Abstract
Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in dimensions, and hidden neurons. We assume that both the sample size and the dimension are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
