Beyond the Edge of Stability via Two-step Gradient Updates
Lei Chen, Joan Bruna

TL;DR
This paper investigates why gradient descent can converge beyond the traditional stability threshold by analyzing two-step updates and higher-order derivatives, revealing new conditions for convergence in overparametrized models.
Contribution
It introduces a local third-order derivative condition that guarantees convergence of two-step gradient updates, extending understanding of stability beyond classical limits.
Findings
Characterizes conditions for convergence involving third-order derivatives.
Demonstrates unstable convergence in high-dimensional matrix factorization.
Provides insights into period-2 orbits of gradient descent in complex settings.
Abstract
Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where it can be seen as a `bona-fide' discretisation of an underlying gradient flow. Yet, many ML setups involving overparametrised models do not fall into this problem class, which has motivated research beyond the so-called ``Edge of Stability'' (EoS), where the step-size crosses the admissibility threshold inversely proportional to the Lipschitz constant above. Perhaps surprisingly, GD has been empirically observed to still converge regardless of local instability and oscillatory behavior. The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Quantum many-body systems
