Gradient descent with adaptive stepsize converges (nearly) linearly under fourth-order growth
Damek Davis, Dmitriy Drusvyatskiy, Liwei Jiang

TL;DR
This paper demonstrates that gradient descent with an adaptive stepsize achieves nearly linear convergence on functions with only fourth-order growth, challenging the belief that quadratic growth is necessary.
Contribution
It introduces a new adaptive stepsize method based on a decomposition theorem, enabling rapid convergence on a broader class of functions than previously thought.
Findings
Gradient descent with adaptive stepsize converges nearly linearly under fourth-order growth.
The proposed method exploits a manifold called the ravine for efficient optimization.
Empirical results on matrix sensing, factorization, and neural learning validate the theory.
Abstract
A prevalent belief among optimization specialists is that linear convergence of gradient descent is contingent on the function growing quadratically away from its minimizers. In this work, we argue that this belief is inaccurate. We show that gradient descent with an adaptive stepsize converges at a local (nearly) linear rate on any smooth function that merely exhibits fourth-order growth away from its minimizer. The adaptive stepsize we propose arises from an intriguing decomposition theorem: any such function admits a smooth manifold around the optimal solution -- which we call the ravine -- so that the function grows at least quadratically away from the ravine and has constant order growth along it. The ravine allows one to interlace many short gradient steps with a single long Polyak gradient step, which together ensure rapid convergence to the minimizer. We illustrate the theory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Thermodynamics and Statistical Mechanics · Micro and Nano Robotics · Stochastic processes and statistical mechanics
