Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
Sanjeev Arora, Simon S. Du, Wei Hu, Zhiyuan Li, Ruosong Wang

TL;DR
This paper provides a detailed analysis of training dynamics and generalization in overparameterized two-layer ReLU neural networks, introducing tighter training speed bounds, size-independent generalization measures, and demonstrating learnability of smooth functions.
Contribution
It offers novel, data-dependent generalization bounds and a refined understanding of training dynamics, improving upon prior theoretical analyses of overparameterized neural networks.
Findings
Training with random labels is slower, explained by a new characterization.
Generalization bounds are independent of network size, validated by experiments.
Broad class of smooth functions can be learned via gradient descent on 2-layer ReLU nets.
Abstract
Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · *Communicated@Fast*How Do I Communicate to Expedia?
