No bad local minima: Data independent training error guarantees for   multilayer neural networks

Daniel Soudry; Yair Carmon

arXiv:1605.08361·stat.ML·May 31, 2016·159 cites

No bad local minima: Data independent training error guarantees for multilayer neural networks

Daniel Soudry, Yair Carmon

PDF

Open Access

TL;DR

This paper proves that under mild over-parametrization, multilayer neural networks with piecewise linear activations have zero training error at all differentiable local minima, explaining their ease of training despite non-convexity.

Contribution

It provides data-independent guarantees that all differentiable local minima yield zero training error in over-parameterized multilayer neural networks, extending previous results to multiple hidden layers.

Findings

01

All differentiable local minima have zero training error for single hidden layer networks.

02

The results hold for almost every dataset and noise realization.

03

The guarantees are verified numerically, supporting empirical observations.

Abstract

We use smoothed analysis techniques to provide guarantees on the training loss of Multilayer Neural Networks (MNNs) at differentiable local minima. Specifically, we examine MNNs with piecewise linear activation functions, quadratic loss and a single output, under mild over-parametrization. We prove that for a MNN with one hidden layer, the training error is zero at every differentiable local minimum, for almost every dataset and dropout-like noise realization. We then extend these results to the case of more than one hidden layer. Our theoretical guarantees assume essentially nothing on the training data, and are verified numerically. These results suggest why the highly non-convex loss of such MNNs can be easily optimized using local updates (e.g., stochastic gradient descent), as observed empirically.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications