Adaptive Accelerated Gradient Descent Methods for Convex Optimization

Zeyi Xu; Long Chen

arXiv:2601.19013·math.OC·February 10, 2026

Adaptive Accelerated Gradient Descent Methods for Convex Optimization

Zeyi Xu, Long Chen

PDF

Open Access 3 Reviews

TL;DR

The paper introduces A$^2$GD, an adaptive accelerated gradient method for convex optimization that reduces gradient evaluations and maintains strong convergence through Lyapunov-based adaptivity and stability-inspired line search.

Contribution

It presents a novel adaptive accelerated gradient method that dynamically updates smoothness constants and triggers line search based on stability analysis, improving efficiency over existing methods.

Findings

01

Reduces gradient evaluations compared to traditional methods.

02

Achieves strong convergence guarantees with adaptive step size.

03

Outperforms existing first-order methods in various convex problems.

Abstract

This work proposes A $^{2}$ GD, a novel adaptive accelerated gradient descent method for convex and composite optimization. Smoothness and convexity constants are updated via Lyapunov analysis. Inspired by stability analysis in ODE solvers, the method triggers line search only when accumulated perturbations become positive, thereby reducing gradient evaluations while preserving strong convergence guarantees. By integrating adaptive step size and momentum acceleration, A $^{2}$ GD outperforms existing first-order methods across a range of problem settings.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

Because line search can be computationally expensive, there has been considerable interest in developing line-search-free adaptive methods. In this context, proposing a method that triggers line search only occasionally, when it is truly needed, is novel and worth investigating.

Weaknesses

Although I found the analysis and the results in Figures 1 and 2 for the gradient descent interesting, the proposed Algorithm 1 introduces several additional components, such as initialization with ten iterations of gradient descent and restarting, which make it difficult to isolate the source of improvement, especially in the experimental results. In other words, I would like to see a fair comparison focusing solely on line-search-free aspect, without incorporating other auxiliary components. F

Reviewer 02Rating 6Confidence 3

Strengths

- The paper proposes an interesting "line search-reduced" adaptive acceleration method through the elegant use of a running perturbation balance $p_k$ to trigger line search only when the Lyapunov-stability condition $p_k \leq 0$ is violated. - The paper provides clear convergence guarantees for both convex and strongly convex objectives. The way $L_k$ and $\mu_k$ are adaptively controlled makes sense and the Lyapunov-based analysis is well presented. - It's nice to see a clean algorithmic i

Weaknesses

- There's no theoretical upper bound on the number of line search activations. While each backtracking loop is shown to finish in $O(\log L)$ steps, the paper doesn't say how often these activations happen overall. In the worst case, frequent triggers could blow up the total gradient evaluations and weaken the claimed overall complexity. - The experimental evaluation is somewhat limited. Results are shown only in terms of gradient evaluations; no wall-clock time, memory or cost breakdown (e.g.

Reviewer 03Rating 4Confidence 3

Strengths

The proposed method is relatively simple and exploits a natural observation that there are several error terms in the standard analysis and forcing all terms to be negative might be too conservative and costly. The proofs build on largely the standard arguments with one new argument for keeping track of the sum of the error terms. The experiments show strong performance compared with other theoretical adaptive methods and un-tuned non-adaptive methods.

Weaknesses

The accelerated result has two new hyperparameters R and epsilon compared with the standard method, which negates the adaptivity advantage of the method. There are a lot of heuristics in the new method (heuristic line search, warm-start to set mu) and several extra hyperparameters. In the experimental comparison, it looks like the non adaptive methods are not tuned at all, leading to oscillating behavior. This is already observed in the literature e.g. O'Donoghue, Candes. Adaptive Restart for A

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques