The Loss Surfaces of Multilayer Networks

Anna Choromanska; Mikael Henaff; Michael Mathieu; G\'erard Ben Arous,; Yann LeCun

arXiv:1412.0233·cs.LG·January 23, 2015·718 cites

The Loss Surfaces of Multilayer Networks

Anna Choromanska, Mikael Henaff, Michael Mathieu, G\'erard Ben Arous,, Yann LeCun

PDF

Open Access 1 Repo

TL;DR

This paper analyzes the loss surface of multilayer neural networks by connecting it to spin-glass models, revealing a layered structure of critical points and implications for optimization and generalization.

Contribution

It introduces a theoretical framework linking neural network loss landscapes to spin-glass models, explaining the distribution of local minima and their relation to network size.

Findings

01

Loss surfaces form layered structures with a band of low critical points.

02

Number of poor local minima outside the band decreases exponentially with size.

03

Global minimum recovery becomes harder as network size increases, often leading to overfitting.

Abstract

We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jchunn/Ambition
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Random Matrices and Applications

MethodsStochastic Gradient Descent