Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization

Ekaterina Borodich; Dmitry Kovalev

arXiv:2507.09823·math.OC·September 1, 2025

Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization

Ekaterina Borodich, Dmitry Kovalev

PDF

Open Access 3 Reviews

TL;DR

This paper introduces an accelerated version of the GRAAL algorithm that adapts to local curvature and achieves near-optimal convergence rates for convex optimization, improving upon previous methods.

Contribution

The authors develop an Nesterov-accelerated GRAAL algorithm that attains optimal convergence rates while maintaining adaptive stepsize estimation without line search.

Findings

01

Achieves near-optimal iteration complexity for L-smooth functions.

02

Adapts stepsize to local curvature at a linear rate.

03

Works under general (L0,L1)-smoothness assumptions.

Abstract

In this paper, we focus on the problem of minimizing a continuously differentiable convex objective function, $min_{x} f (x)$ . Recently, Malitsky (2020); Alacaoglu et al.(2023) developed an adaptive first-order method, GRAAL. This algorithm computes stepsizes by estimating the local curvature of the objective function without any line search procedures or hyperparameter tuning, and attains the standard iteration complexity $O (L ∥ x_{0} - x^{*} ∥^{2} / ϵ)$ of fixed-stepsize gradient descent for $L$ -smooth functions. However, a natural question arises: is it possible to accelerate the convergence of GRAAL to match the optimal complexity $O (L ∥ x_{0} - x^{*} ∥^{2} / ϵ)$ of the accelerated gradient descent of Nesterov (1983)? Although some attempts have been made by Li and Lan (2025); Suh and Ma (2025), the ability of existing accelerated algorithms to…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- As clearly summarized in Table 1, this paper proposes the first adaptive accelerated method that achieves the optimal rate under convexity and $(L_0,L_1)$-smoothness. - Even under the standard smoothness assumption, the proposed adaptive accelerated method allows the step size to grow linearly, unlike existing methods, which is particularly beneficial when the initial step size is small.

Weaknesses

- Although this is primarily a theoretical paper, including a simple toy experiment would make the contribution more informative. (This is not a requirement for acceptance, but a suggestion to strengthen the paper.) - I would appreciate it if Section 4.1 provided a higher-level sketch of the main idea.

Reviewer 02Rating 6Confidence 3

Strengths

1. The additional coupling step is a novel technique that elegantly enabling acceleration and adaptive step-size growth in one framework. 2. The analysis is layered and rigorous: a general potential-descent framework for convex $C^1$ objectives, followed by rate results for $L$-smooth and $(L_0,L_1)$-smooth settings. The bounds are near-optimal and adapt to unknown curvature, with only logarithmic dependence on the initial stepsize guess in the $L$-smooth case. 3. The paper clearly states its

Weaknesses

1. Lack of empirical validation. The paper presents no experiments. Small-scale benchmarks (logistic regression, least-squares, GLMs, robust convex losses) would verify geometric step-size growth in practice, overhead of computing $\Lambda$, and wall-clock speedups versus other algorithms. 2. The theory requires universal constants $(\theta, \gamma, \nu)>0$ satisfying equation (19), but the paper does not provide guidance on how to choose these parameters in practice. This is a significant pra

Reviewer 03Rating 10Confidence 5

Strengths

The paper solves the issue with explicit (no backtracking) adaptive accelerated methods -- they cannot increase the stepsize fast. Detailed comparison with related works.

Weaknesses

Despite experiments would be quite predictable it would be interesting to see a comparison with adproxgd and optimizers with backtracking beyond the standard smoothness. Not a real weakness, for the sake of comprehensiveness, the theoretical contribution is solid enough.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research