A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

Sichao Xiong; Sadok Jerad; Coralia Cartis

arXiv:2605.02127·math.OC·May 5, 2026

A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate

Sichao Xiong, Sadok Jerad, Coralia Cartis

PDF

TL;DR

We propose PF-AGD, a parameter-free, deterministic accelerated first-order method that achieves the optimal $O(\epsilon^{-5/3}\log(1/\epsilon))$ rate for non-convex optimization without prior knowledge of smoothness constants.

Contribution

We introduce PF-AGD, the first parameter-free accelerated method attaining the best-known rate for smooth non-convex functions, using adaptive schemes instead of fixed parameters.

Findings

01

PF-AGD achieves the optimal $O(\epsilon^{-5/3}\log(1/\epsilon))$ complexity bound.

02

PF-AGD outperforms existing parameter-free variants and practical AGD-Until-Guilty.

03

PF-AGD is a practical alternative to nonlinear conjugate gradient methods.

Abstract

We introduce PF-AGD, the first parameter-free, deterministic, accelerated first-order method to achieve $O (ϵ^{- 5/3} lo g (1/ ϵ))$ oracle complexity bound when minimizing sufficiently smooth, non-convex functions; this is the best-known bound for first-order methods on smooth non-convex objectives. Unlike existing methods possessing this rate that require a priori knowledge of smoothness constants, we use an adaptive backtracking scheme and a gradient-based restart mechanism to estimate local curvature. This yields a practical algorithm that matches best-known theoretical rates. Empirically, PF-AGD outperforms the practical variant of AGD-Until-Guilty (Carmon et al., 2017), as well as other parameter-free variants, and is a viable alternative to nonlinear conjugate gradient methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.