Training on the Edge of Stability Is Caused by Layerwise Jacobian   Alignment

Mark Lowell; Catharine Kastner

arXiv:2406.00127·stat.ML·June 4, 2024

Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment

Mark Lowell, Catharine Kastner

PDF

Open Access

TL;DR

This paper investigates why neural network training reaches the edge of stability, revealing that layerwise Jacobian alignment causes sharpness increase, and demonstrates how this phenomenon scales with dataset size.

Contribution

It identifies layerwise Jacobian alignment as the cause of edge of stability during training and introduces an exponential Euler method to avoid this regime.

Findings

01

Layerwise Jacobian matrices become aligned during training.

02

Alignment correlates with increased Hessian sharpness.

03

Scaling of alignment follows a power law with dataset size.

Abstract

During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that we accurately approximate the true gradient descent dynamics. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the network preactivations near the inputs of the network can cause a large change in the outputs of the network. We further demonstrate that the degree of alignment scales with the size of the dataset by a power law with a coefficient of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Mechanisms and Dynamics