Elimination of All Bad Local Minima in Deep Learning

Kenji Kawaguchi; Leslie Pack Kaelbling

arXiv:1901.00279·cs.LG·January 17, 2020·38 cites

Elimination of All Bad Local Minima in Deep Learning

Kenji Kawaguchi, Leslie Pack Kaelbling

PDF

Open Access

TL;DR

This paper proves that adding a single special neuron per output unit can eliminate all bad local minima in deep neural networks for various tasks, ensuring global optimality at local minima.

Contribution

It introduces a theoretical method to remove all suboptimal local minima in deep learning models by adding one neuron per output, with proofs and new analysis techniques.

Findings

01

Adding one special neuron per output unit guarantees global optimality at local minima.

02

The effects of added neurons vanish at local minima, ensuring no suboptimal minima remain.

03

A novel proof technique using perturbable gradient basis provides new insights into local minima elimination.

Abstract

In this paper, we theoretically prove that adding one special neuron per output unit eliminates all suboptimal local minima of any deep neural network, for multi-class classification, binary classification, and regression with an arbitrary loss function, under practical assumptions. At every local minimum of any deep neural network with these added neurons, the set of parameters of the original neural network (without added neurons) is guaranteed to be a global minimum of the original neural network. The effects of the added neurons are proven to automatically vanish at every local minimum. Moreover, we provide a novel theoretical characterization of a failure mode of eliminating suboptimal local minima via an additional theorem and several examples. This paper also introduces a novel proof technique based on the perturbable gradient basis (PGB) necessary condition of local minima,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Machine Learning and Algorithms