Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares

Lachlan Ewen MacDonald; Hancheng Min; Leandro Palma; Salma Tarmoun; Ziqing Xu; Ren\'e Vidal

arXiv:2510.17506·cs.LG·October 21, 2025

Convergence Rates for Gradient Descent on the Edge of Stability in Overparametrised Least Squares

Lachlan Ewen MacDonald, Hancheng Min, Leandro Palma, Salma Tarmoun, Ziqing Xu, Ren\'e Vidal

PDF

Open Access

TL;DR

This paper analyzes the convergence behavior of gradient descent with large learning rates in overparametrized least squares, revealing distinct regimes and dynamics related to the edge of stability phenomenon.

Contribution

It provides the first detailed convergence rates for GD in the edge of stability regime, leveraging the geometric structure of the solution set in overparametrized models.

Findings

01

Finite-time convergence in the subcritical regime.

02

Power-law convergence toward flat minima in the critical regime.

03

Linear convergence to a period-two orbit in the supercritical regime.

Abstract

Classical optimisation theory guarantees monotonic objective decrease for gradient descent (GD) when employed in a small step size, or ``stable", regime. In contrast, gradient descent on neural networks is frequently performed in a large step size regime called the ``edge of stability", in which the objective decreases non-monotonically with an observed implicit bias towards flat minima. In this paper, we take a step toward quantifying this phenomenon by providing convergence rates for gradient descent with large learning rates in an overparametrised least squares setting. The key insight behind our analysis is that, as a consequence of overparametrisation, the set of global minimisers forms a Riemannian manifold $M$ , which enables the decomposition of the GD dynamics into components parallel and orthogonal to $M$ . The parallel component corresponds to Riemannian gradient descent on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Neural Networks and Reservoir Computing