Depth Creates No Bad Local Minima

Haihao Lu; Kenji Kawaguchi

arXiv:1702.08580·cs.LG·May 25, 2017·81 cites

Depth Creates No Bad Local Minima

Haihao Lu, Kenji Kawaguchi

PDF

Open Access

TL;DR

This paper proves that depth alone, without nonlinearity, does not create bad local minima in deep linear neural networks, simplifying previous proofs and extending understanding of loss surface properties.

Contribution

It demonstrates that depth alone does not cause bad local minima in linear networks and simplifies existing proofs, broadening theoretical insights into deep linear models.

Findings

01

All local minima in deep linear networks are global minima.

02

Depth alone does not create bad local minima without nonlinearity.

03

The analysis extends to loss functions beyond square loss.

Abstract

In deep learning, \textit{depth}, as well as \textit{nonlinearity}, create non-convex loss surfaces. Then, does depth alone create bad local minima? In this paper, we prove that without nonlinearity, depth alone does not create bad local minima, although it induces non-convex loss surface. Using this insight, we greatly simplify a recently proposed proof to show that all of the local minima of feedforward deep linear neural networks are global minima. Our theoretical results generalize previous results with fewer assumptions, and this analysis provides a method to show similar results beyond square loss in deep linear models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Adversarial Robustness in Machine Learning