There is a Singularity in the Loss Landscape
Mark Lowell

TL;DR
This paper reveals a singularity in the loss landscape of neural networks that emerges with larger datasets, affecting training dynamics and explaining phenomena like edge of stability and gradient concentration.
Contribution
It experimentally demonstrates the existence of a loss landscape singularity and links it to observed training phenomena in neural networks.
Findings
Gradient magnitude becomes unbounded at a singularity as dataset size increases.
Networks approach this singularity during training, influencing stability and gradient behavior.
The top subspace of the gradient contributes little to learning near the singularity.
Abstract
Despite the widespread adoption of neural networks, their training dynamics remain poorly understood. We show experimentally that as the size of the dataset increases, a point forms where the magnitude of the gradient of the loss becomes unbounded. Gradient descent rapidly brings the network close to this singularity in parameter space, and further training takes place near it. This singularity explains a variety of phenomena recently observed in the Hessian of neural network loss functions, such as training on the edge of stability and the concentration of the gradient in a top subspace. Once the network approaches the singularity, the top subspace contributes little to learning, even though it constitutes the majority of the gradient.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
