The loss landscape of overparameterized neural networks
Y Cooper

TL;DR
This paper investigates the mathematical structure of the loss landscape in overparameterized neural networks, revealing that the set of global minima forms a high-dimensional manifold rather than isolated points, which impacts understanding of neural network optimization.
Contribution
It proves that in overparameterized neural networks, the global minima form a high-dimensional manifold, contrasting with the typical view of isolated minima in nonconvex functions.
Findings
Global minima form an (n-d)-dimensional manifold
Loss landscape differs from typical nonconvex functions
High-dimensional minima are common in overparameterized networks
Abstract
We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from to - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has parameters and is trained on data points, with , we show that the locus of global minima of is usually not discrete, but rather an dimensional submanifold of . In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that is typically a very high-dimensional subset of .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications
