The loss landscape of overparameterized neural networks

Y Cooper

arXiv:1804.10200·cs.LG·April 27, 2018·36 cites

The loss landscape of overparameterized neural networks

Y Cooper

PDF

Open Access

TL;DR

This paper investigates the mathematical structure of the loss landscape in overparameterized neural networks, revealing that the set of global minima forms a high-dimensional manifold rather than isolated points, which impacts understanding of neural network optimization.

Contribution

It proves that in overparameterized neural networks, the global minima form a high-dimensional manifold, contrasting with the typical view of isolated minima in nonconvex functions.

Findings

01

Global minima form an (n-d)-dimensional manifold

02

Loss landscape differs from typical nonconvex functions

03

High-dimensional minima are common in overparameterized networks

Abstract

We explore some mathematical features of the loss landscape of overparameterized neural networks. A priori one might imagine that the loss function looks like a typical function from $R^{n}$ to $R$ - in particular, nonconvex, with discrete global minima. In this paper, we prove that in at least one important way, the loss function of an overparameterized neural network does not look like a typical function. If a neural net has $n$ parameters and is trained on $d$ data points, with $n > d$ , we show that the locus $M$ of global minima of $L$ is usually not discrete, but rather an $n - d$ dimensional submanifold of $R^{n}$ . In practice, neural nets commonly have orders of magnitude more parameters than data points, so this observation implies that $M$ is typically a very high-dimensional subset of $R^{n}$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Applications