The Loss Surfaces of Multilayer Networks
Anna Choromanska, Mikael Henaff, Michael Mathieu, G\'erard Ben Arous,, Yann LeCun

TL;DR
This paper analyzes the loss surface of multilayer neural networks by connecting it to spin-glass models, revealing a layered structure of critical points and implications for optimization and generalization.
Contribution
It introduces a theoretical framework linking neural network loss landscapes to spin-glass models, explaining the distribution of local minima and their relation to network size.
Findings
Loss surfaces form layered structures with a band of low critical points.
Number of poor local minima outside the band decreases exponentially with size.
Global minimum recovery becomes harder as network size increases, often leading to overfitting.
Abstract
We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Random Matrices and Applications
MethodsStochastic Gradient Descent
