Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

Manuel Graca; L. Miguel Silveira; Arlindo Oliveira; Frank Liu

arXiv:2605.16017·cs.LG·May 18, 2026

Accelerated Gradient Descent for Faster Convergence with Minimal Overhead

Manuel Graca, L. Miguel Silveira, Arlindo Oliveira, Frank Liu

PDF

TL;DR

This paper introduces CT-AGD, a curvature-aware accelerated gradient method that speeds up deep learning training by reducing epochs without extra overhead, leveraging local curvature estimates.

Contribution

The paper proposes CT-AGD, a novel optimization algorithm that accelerates first-order methods in non-convex deep learning tasks with minimal additional computational cost.

Findings

01

Reduces training epochs by 33% on average.

02

Achieves comparable accuracy to baseline methods.

03

Maintains similar storage and computational overhead as Adam.

Abstract

In this paper, we present CT-AGD (Curvature-Tuned Accelerated Gradient Descent), an optimization method for non-convex optimization problems in deep learning training tasks. CT-AGD is a general boosting procedure that accelerates first-order methods by explicitly capturing the local curvature using finite-difference quotients, and the development of heuristics aimed at mitigating noise and bias introduced by stochastic mini-batch training. CT-AGD has a comparable storage and computational overhead as adaptive gradient methods such as Adam. Our extensive experiments demonstrate that CT-AGD achieves the same level of accuracy as the baseline first-order methods, yet reduces the required training epochs by 33% on average.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.