Spurious Valleys in Two-layer Neural Network Optimization Landscapes
Luca Venturi, Afonso S. Bandeira, Joan Bruna

TL;DR
This paper investigates the loss surface topology of two-layer neural networks, showing that intrinsic dimension determines the presence of spurious valleys, which affects optimization success.
Contribution
It introduces the concept of intrinsic dimension and establishes necessary and sufficient conditions for the existence of spurious valleys in neural network loss landscapes.
Findings
Finite intrinsic dimension ensures no spurious valleys in overparametrized models.
Infinite intrinsic dimension can lead to spurious valleys for certain data distributions.
Spurious valleys are confined to low risk levels and are typically avoided in overparametrized models.
Abstract
Neural networks provide a rich class of high-dimensional, non-convex optimization problems. Despite their non-convexity, gradient-descent methods often successfully optimize these models. This has motivated a recent spur in research attempting to characterize properties of their loss surface that may explain such success. In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not include a global minimum. Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient conditions for the absence of spurious valleys. More concretely, finite intrinsic dimension guarantees that for sufficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Topological and Geometric Data Analysis · Sparse and Compressive Sensing Techniques
