Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability

Avrajit Ghosh; Soo Min Kwon; Rongrong Wang; Saiprasad Ravishankar,; Qing Qu

arXiv:2502.20531·stat.ML·March 3, 2025

Learning Dynamics of Deep Linear Networks Beyond the Edge of Stability

Avrajit Ghosh, Soo Min Kwon, Rongrong Wang, Saiprasad Ravishankar,, Qing Qu

PDF

TL;DR

This paper analyzes the learning dynamics of deep linear networks beyond the edge of stability, revealing loss oscillations, chaos, and the role of symmetry-breaking in training behavior.

Contribution

It provides a detailed theoretical analysis of deep linear networks' behavior beyond EOS, including loss oscillations, chaos, and the impact of symmetry-breaking on training dynamics.

Findings

01

Loss oscillations follow a period-doubling route to chaos.

02

Loss oscillations occur within a subspace characterized by the learning rate.

03

Symmetry-breaking at EOS leads to monotonic decay of the balancing gap.

Abstract

Deep neural networks trained using gradient descent with a fixed learning rate $η$ often operate in the regime of "edge of stability" (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold $2/ η$ . In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.