There is a Singularity in the Loss Landscape

Mark Lowell

arXiv:2201.06964·cs.LG·July 23, 2024

There is a Singularity in the Loss Landscape

Mark Lowell

PDF

Open Access

TL;DR

This paper reveals a singularity in the loss landscape of neural networks that emerges with larger datasets, affecting training dynamics and explaining phenomena like edge of stability and gradient concentration.

Contribution

It experimentally demonstrates the existence of a loss landscape singularity and links it to observed training phenomena in neural networks.

Findings

01

Gradient magnitude becomes unbounded at a singularity as dataset size increases.

02

Networks approach this singularity during training, influencing stability and gradient behavior.

03

The top subspace of the gradient contributes little to learning near the singularity.

Abstract

Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques