Gradient-Normalized Smoothness for Optimization with Approximate Hessians
Andrei Semenov, Martin Jaggi, Nikita Doikov

TL;DR
This paper introduces Gradient-Normalized Smoothness, a new concept that enables optimization algorithms with approximate Hessians to achieve fast, global convergence across various function classes, including non-convex problems.
Contribution
It proposes Gradient-Normalized Smoothness, a universal measure linking Hessian approximation to gradient behavior, leading to algorithms with guaranteed global convergence rates for diverse objectives.
Findings
Achieves state-of-the-art convergence rates for functions with Hölder-continuous Hessians.
Provides global linear rates for logistic regression and softmax with approximate Hessians.
Extends to non-convex optimization using Fisher and Gauss-Newton approximations.
Abstract
In this work, we develop new optimization algorithms that use approximate second-order information combined with the gradient regularization technique to achieve fast global convergence rates for both convex and non-convex objectives. The key innovation of our analysis is a novel notion called Gradient-Normalized Smoothness, which characterizes the maximum radius of a ball around the current point that yields a good relative approximation of the gradient field. Our theory establishes a natural intrinsic connection between Hessian approximation and the linearization of the gradient. Importantly, Gradient-Normalized Smoothness does not depend on the specific problem class of the objective functions, while effectively translating local information about the gradient field and Hessian approximation into the global behavior of the method. This new concept equips approximate second-order…
Peer Reviews
Decision·ICLR 2026 Poster
**1. Unified Theoretical Framework**: The introduction and analysis of Gradient-Normalized Smoothness offers a conceptually appealing unification of smoothness and Hessian approximation error, allowing the same framework to recover or extend the best known rates for several classes of optimization problems. **2. Strong Experimental and Visual Evidence.**: The paper provides thorough experimental validation with clear visualizations, which is easy to follow, and I like the layout of the paper (e
I am not an expert in optimization; therefore, from my perspective, the paper does not exhibit any specific weaknesses. I will finalize my rating after reviewing other reviewers’ comments and the authors’ responses.
1. The paper introduces a powerful new theoretical framework for developing optimization algorithms that leverage approximate second-order information while guaranteeing fast global convergence for both convex and non-convex objectives. The central innovation is the Gradient-Normalized Smoothness (GNS), a novel, universal notion that locally characterizes the maximum radius of a ball around the current point where the gradient field is well-approximated. This concept provides a unified mechanism
1. The notion of Gradient-Normalized Smoothness is defined mathematically but not well-motivated in terms of optimization geometry or curvature behavior. The crucial element appears to be the specific gradient-based normalization ((1/γ) ||g||* ||h||). The paper should elaborate on why this specific normalization is the key that unlocks the unified analysis. It is also unclear how this notion of smoothness compares to established notions like relative or anisotropic smoothness. 3. The theoretic
The paper is very well-written, accurate, and clear. The results are correct, new, and interesting for the community. It is an outstanding paper. I list several strengths of the paper: 1. The paper proposes new concepts of smoothness that generalize and easily connect many well-known and modern classes of functions. For this class of gradient-normalized smooth functions, they provide examples of functions satisfying it. For classical function classes, they provide a lower bound on the smoothnes
I couldn’t find any major weakness in the paper. Minor: 1. While the numerical experiments show the effectiveness of the Gradient-Regularized Newton method with the proposed stepsizes, it would be beneficial to compare it with other adaptive or universal methods, both with exact and inexact Hessians.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Numerical methods in inverse problems · Mathematical Biology Tumor Growth
MethodsLogistic Regression · Softmax
