The instabilities of large learning rate training: a loss landscape view

Lawrence Wang; Stephen Roberts

arXiv:2307.11948·cs.LG·July 25, 2023·1 cites

The instabilities of large learning rate training: a loss landscape view

Lawrence Wang, Stephen Roberts

PDF

Open Access

TL;DR

This paper investigates how large learning rates influence neural network training stability by analyzing loss landscape curvature and Hessian dynamics, revealing phenomena like landscape flattening and shifting that relate to training instabilities.

Contribution

It introduces a novel analysis of loss landscape instabilities at large learning rates through Hessian matrix examination, highlighting phenomena like landscape flattening and shift.

Findings

01

Identification of landscape flattening during unstable training

02

Observation of landscape shift phenomena

03

Linking landscape behaviors to training instabilities

Abstract

Modern neural networks are undeniably successful. Numerous works study how the curvature of loss landscapes can affect the quality of solutions. In this work we study the loss landscape by considering the Hessian matrix during network training with large learning rates - an attractive regime that is (in)famously unstable. We characterise the instabilities of gradient descent, and we observe the striking phenomena of \textit{landscape flattening} and \textit{landscape shift}, both of which are intimately connected to the instabilities of training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Quantum many-body systems · Advanced Thermodynamics and Statistical Mechanics