The Interpolation Phase Transition in Neural Networks: Memorization and   Generalization under Lazy Training

Andrea Montanari; Yiqiao Zhong

arXiv:2007.12826·stat.ML·June 10, 2022·6 cites

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training

Andrea Montanari, Yiqiao Zhong

PDF

Open Access

TL;DR

This paper analyzes the interpolation and generalization properties of overparametrized two-layer neural networks in the neural tangent regime, revealing how they can interpolate data and generalize well through kernel ridge regression approximations.

Contribution

It provides a theoretical characterization of the eigenstructure of the NT kernel and the generalization error in the overparametrized regime, connecting neural networks to polynomial ridge regression.

Findings

01

Eigenstructure of NT kernel is well-characterized in overparametrized regime

02

Minimum eigenvalue of NT kernel remains bounded away from zero when Nd≫n

03

Test error approximates kernel ridge regression error in the overparametrized regime

Abstract

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in $d$ dimensions, and $N$ hidden neurons. We assume that both the sample size $n$ and the dimension $d$ are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques