The instabilities of large learning rate training: a loss landscape view
Lawrence Wang, Stephen Roberts

TL;DR
This paper investigates how large learning rates influence neural network training stability by analyzing loss landscape curvature and Hessian dynamics, revealing phenomena like landscape flattening and shifting that relate to training instabilities.
Contribution
It introduces a novel analysis of loss landscape instabilities at large learning rates through Hessian matrix examination, highlighting phenomena like landscape flattening and shift.
Findings
Identification of landscape flattening during unstable training
Observation of landscape shift phenomena
Linking landscape behaviors to training instabilities
Abstract
Modern neural networks are undeniably successful. Numerous works study how the curvature of loss landscapes can affect the quality of solutions. In this work we study the loss landscape by considering the Hessian matrix during network training with large learning rates - an attractive regime that is (in)famously unstable. We characterise the instabilities of gradient descent, and we observe the striking phenomena of \textit{landscape flattening} and \textit{landscape shift}, both of which are intimately connected to the instabilities of training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Quantum many-body systems · Advanced Thermodynamics and Statistical Mechanics
