No bad local minima: Data independent training error guarantees for multilayer neural networks
Daniel Soudry, Yair Carmon

TL;DR
This paper proves that under mild over-parametrization, multilayer neural networks with piecewise linear activations have zero training error at all differentiable local minima, explaining their ease of training despite non-convexity.
Contribution
It provides data-independent guarantees that all differentiable local minima yield zero training error in over-parameterized multilayer neural networks, extending previous results to multiple hidden layers.
Findings
All differentiable local minima have zero training error for single hidden layer networks.
The results hold for almost every dataset and noise realization.
The guarantees are verified numerically, supporting empirical observations.
Abstract
We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
