Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent
Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

TL;DR
This paper provides the first rigorous convergence analysis of hypergradient descent, demonstrating its ability to adapt stepsizes effectively, achieve superlinear convergence, and outperform existing adaptive methods in convex optimization tasks.
Contribution
It offers a theoretical convergence analysis of HDM, introduces new variants with momentum, and empirically shows superior performance over other adaptive methods and quasi-Newton algorithms.
Findings
HDM automatically finds optimal local stepsizes.
HDM with heavy-ball momentum outperforms other adaptive methods.
HDM-HB matches L-BFGS performance with less memory and computation.
Abstract
This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, HDM automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of HDM reported in the literature and proposes efficient strategies to address it. We also develop two HDM variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOnline Learning and Analytics · Analog and Mixed-Signal Circuit Design
