SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data
Alon Brutzkus, Amir Globerson, Eran Malach, Shai Shalev-Shwartz

TL;DR
This paper proves that stochastic gradient descent (SGD) can effectively train over-parameterized two-layer neural networks with Leaky ReLU activations on linearly separable data, achieving global minima and avoiding overfitting.
Contribution
It provides the first theoretical guarantees that SGD finds global minima and generalizes well in over-parameterized neural networks on linearly separable data.
Findings
SGD converges to a global minimum in over-parameterized networks.
Generalization bounds are independent of network size.
SGD avoids overfitting despite high model capacity.
Abstract
Neural networks exhibit good generalization behavior in the over-parameterized regime, where the number of network parameters exceeds the number of observations. Nonetheless, current generalization bounds for neural networks fail to explain this phenomenon. In an attempt to bridge this gap, we study the problem of learning a two-layer over-parameterized neural network, when the data is generated by a linearly separable function. In the case where the network has Leaky ReLU activations, we provide both optimization and generalization guarantees for over-parameterized networks. Specifically, we prove convergence rates of SGD to a global minimum and provide generalization guarantees for this global minimum that are independent of the network size. Therefore, our result clearly shows that the use of SGD for optimization both finds a global minimum, and avoids overfitting despite the high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Model Reduction and Neural Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia? · HuMan(Expedia)||How do I get a human at Expedia? · Stochastic Gradient Descent
