Gradient Methods with Online Scaling Part I. Theoretical Foundations

Wenzhi Gao; Ya-Chi Chu; Yinyu Ye; Madeleine Udell

arXiv:2505.23081·math.OC·September 8, 2025

Gradient Methods with Online Scaling Part I. Theoretical Foundations

Wenzhi Gao, Ya-Chi Chu, Yinyu Ye, Madeleine Udell

PDF

Open Access

TL;DR

This paper introduces the online scaled gradient methods (OSGM), a new framework that adaptively adjusts stepsizes using online learning, leading to accelerated convergence and explaining empirical heuristics in machine learning optimization.

Contribution

It establishes the theoretical foundations of OSGM, demonstrating its convergence guarantees and superlinear rates, and connects it to practical hypergradient-descent heuristics.

Findings

01

Achieves trajectory-dependent global convergence on smooth convex functions.

02

Provides improved complexity bounds for strongly convex problems.

03

Exhibits local superlinear convergence, similar to quasi-Newton methods.

Abstract

This paper establishes the theoretical foundations of the online scaled gradient methods (OSGM), a framework that utilizes online learning to adapt stepsizes and provably accelerate first-order methods. OSGM quantifies the effectiveness of a stepsize by a feedback function motivated from a convergence measure and uses the feedback to adjust the stepsize through an online learning algorithm. Consequently, instantiations of OSGM achieve convergence rates that are asymptotically no worse than the optimal stepsize. OSGM yields desirable convergence guarantees on smooth convex problems, including 1) trajectory-dependent global convergence on smooth convex objectives; 2) an improved complexity result on smooth strongly convex problems, and 3) local superlinear convergence. Notably, OSGM constitutes a new family of first-order methods with non-asymptotic superlinear convergence, joining the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Advanced Bandit Algorithms Research