Learning and Generalization in Overparameterized Neural Networks, Going   Beyond Two Layers

Zeyuan Allen-Zhu; Yuanzhi Li; Yingyu Liang

arXiv:1811.04918·cs.LG·June 2, 2020·172 cites

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang

PDF

Open Access

TL;DR

This paper proves that overparameterized neural networks, including two and three-layer models with smooth activations, can learn complex functions efficiently with polynomial time and sample complexity, surpassing NTK limitations.

Contribution

It introduces a new quadratic approximation framework for neural networks, enabling analysis beyond NTK and demonstrating learnability of certain classes with fewer parameters.

Findings

01

Overparameterized networks can learn notable concept classes.

02

SGD can train these networks efficiently in polynomial time.

03

Sample complexity is nearly independent of network size.

Abstract

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network (that can be viewed as a second-order…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsNeural Tangent Kernel · Stochastic Gradient Descent