Provable and Practical Online Learning Rate Adaptation with   Hypergradient Descent

Ya-Chi Chu; Wenzhi Gao; Yinyu Ye; Madeleine Udell

arXiv:2502.11229·math.OC·March 18, 2025

Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

Ya-Chi Chu, Wenzhi Gao, Yinyu Ye, Madeleine Udell

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides the first rigorous convergence analysis of hypergradient descent, demonstrating its ability to adapt stepsizes effectively, achieve superlinear convergence, and outperform existing adaptive methods in convex optimization tasks.

Contribution

It offers a theoretical convergence analysis of HDM, introduces new variants with momentum, and empirically shows superior performance over other adaptive methods and quasi-Newton algorithms.

Findings

01

HDM automatically finds optimal local stepsizes.

02

HDM with heavy-ball momentum outperforms other adaptive methods.

03

HDM-HB matches L-BFGS performance with less memory and computation.

Abstract

This paper investigates the convergence properties of the hypergradient descent method (HDM), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of HDM using the online learning framework of [Gao24] and apply this analysis to develop new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, HDM automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of HDM reported in the literature and proposes efficient strategies to address it. We also develop two HDM variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show HDM with heavy-ball momentum (HDM-HB) exhibits robust performance and significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

udellgroup/hypergrad
noneOfficial

Videos

Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent· slideslive

Taxonomy

TopicsOnline Learning and Analytics · Analog and Mixed-Signal Circuit Design