Learning Overparameterized Neural Networks via Stochastic Gradient   Descent on Structured Data

Yuanzhi Li; Yingyu Liang

arXiv:1808.01204·cs.LG·August 2, 2019·319 cites

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, Yingyu Liang

PDF

Open Access

TL;DR

This paper provides theoretical guarantees for training overparameterized two-layer ReLU neural networks with SGD on structured data, showing small generalization error and insights into neural network learning.

Contribution

It offers the first rigorous analysis of SGD learning overparameterized neural networks on structured data, bridging theory and practice.

Findings

01

SGD learns networks with small generalization error on well-separated data.

02

Theoretical insights are supported by experiments on synthetic and MNIST data.

03

Overparameterization enables fitting arbitrary labels while maintaining generalization.

Abstract

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Machine Learning and Algorithms

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Stochastic Gradient Descent