Annihilation of Spurious Minima in Two-Layer ReLU Networks
Yossi Arjevani, Michael Field

TL;DR
This paper provides a rigorous analysis of how over-parameterization in two-layer ReLU networks eliminates spurious local minima by leveraging symmetry, algebraic geometry, and spectral analysis, thus facilitating gradient-based optimization.
Contribution
It introduces novel algebraic and geometric tools to analyze the loss landscape, demonstrating how increasing neurons transforms spurious minima into saddle points and aids optimization.
Findings
Over-parameterization turns spurious minima into saddle points.
Spectral estimates of the loss and Hessian are obtained at different minima.
Symmetry-breaking perturbations do not negate the spectral properties for fixed network size.
Abstract
We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima. Sharp analytic estimates are obtained for the loss and the Hessian spectrum at different minima, and it is proved that adding neurons can turn symmetric spurious minima into saddles; minima of lesser symmetry require more neurons. Using Cauchy's interlacing theorem, we prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function. This analytic approach uses techniques, new to the field, from algebraic geometry, representation theory and symmetry breaking, and confirms rigorously the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Advanced Memory and Neural Computing · Neural dynamics and brain function
