Deep Learning without Poor Local Minima

Kenji Kawaguchi

arXiv:1605.07110·stat.ML·December 30, 2016·23 cites

Deep Learning without Poor Local Minima

Kenji Kawaguchi

PDF

Open Access 1 Repo

TL;DR

This paper proves that deep linear neural networks have no poor local minima and characterizes saddle points, advancing the theoretical understanding of deep learning optimization landscapes.

Contribution

It proves that all local minima are global and characterizes saddle points for deep linear and nonlinear networks without unrealistic assumptions.

Findings

01

All local minima are global minima.

02

Deeper networks have bad saddle points, shallow networks do not.

03

Theoretical difficulty of training deep models is manageable.

Abstract

In this paper, we prove a conjecture published in 1989 and also partially address an open problem announced at the Conference on Learning Theory (COLT) 2015. With no unrealistic assumption, we first prove the following statements for the squared loss function of deep linear neural networks with any depth and any widths: 1) the function is non-convex and non-concave, 2) every local minimum is a global minimum, 3) every critical point that is not a global minimum is a saddle point, and 4) there exist "bad" saddle points (where the Hessian has no negative eigenvalue) for the deeper networks (with more than three layers), whereas there is no bad saddle point for the shallow networks (with three layers). Moreover, for deep nonlinear neural networks, we prove the same four statements via a reduction to a deep linear model under the independence assumption adopted from recent work. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yijiazh/DFER_Summer2019
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms