The loss surface of deep and wide neural networks

Quynh Nguyen; Matthias Hein

arXiv:1704.08045·cs.LG·June 14, 2017·118 cites

The loss surface of deep and wide neural networks

Quynh Nguyen, Matthias Hein

PDF

Open Access

TL;DR

This paper demonstrates that for certain wide, pyramidal neural networks with analytic activation functions, almost all local minima are globally optimal, explaining why training often succeeds despite non-convexity.

Contribution

It proves that in wide, pyramidal neural networks with squared loss, nearly all local minima are globally optimal, under specific structural conditions.

Findings

01

Almost all local minima are globally optimal in the specified network class.

02

The result applies to networks with more hidden units than training points.

03

Training success is explained by the landscape's benign geometry.

Abstract

While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications