First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Oscar Smee; Fred Roosta; Stephen J. Wright

arXiv:2502.03701·math.OC·July 15, 2025

First-ish Order Methods: Hessian-aware Scalings of Gradient Descent

Oscar Smee, Fred Roosta, Stephen J. Wright

PDF

Open Access

TL;DR

This paper introduces a Hessian-aware scaling method for gradient descent that adaptively adjusts step sizes based on curvature, improving convergence and reducing tuning in large-scale machine learning optimization.

Contribution

It proposes a novel Hessian-aware scaling technique that guarantees local unit step size and achieves linear convergence near minima, with global convergence under weaker smoothness assumptions.

Findings

01

Method achieves linear convergence near local minima.

02

Global convergence is proven under weaker smoothness conditions.

03

Empirical validation shows improved performance on machine learning tasks.

Abstract

Gradient descent is the primary workhorse for optimizing large-scale problems in machine learning. However, its performance is highly sensitive to the choice of the learning rate. A key limitation of gradient descent is its lack of natural scaling, which often necessitates expensive line searches or heuristic tuning to determine an appropriate step size. In this paper, we address this limitation by incorporating Hessian information to scale the gradient direction. By accounting for the curvature of the function along the gradient, our adaptive, Hessian-aware scaling method ensures a local unit step size guarantee, even in nonconvex settings. Near a local minimum that satisfies the second-order sufficient conditions, our approach achieves linear convergence with a unit step size. We show that our method converges globally under a significantly weaker version of the standard Lipschitz…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks