A Parameter-Free First-Order Algorithm for Non-Convex Optimization with $\tilde{\mkern1mu O}(\epsilon^{-5/3})$ Global Rate
Sichao Xiong, Sadok Jerad, Coralia Cartis

TL;DR
We propose PF-AGD, a parameter-free, deterministic accelerated first-order method that achieves the optimal $O(\epsilon^{-5/3}\log(1/\epsilon))$ rate for non-convex optimization without prior knowledge of smoothness constants.
Contribution
We introduce PF-AGD, the first parameter-free accelerated method attaining the best-known rate for smooth non-convex functions, using adaptive schemes instead of fixed parameters.
Findings
PF-AGD achieves the optimal $O(\epsilon^{-5/3}\log(1/\epsilon))$ complexity bound.
PF-AGD outperforms existing parameter-free variants and practical AGD-Until-Guilty.
PF-AGD is a practical alternative to nonlinear conjugate gradient methods.
Abstract
We introduce PF-AGD, the first parameter-free, deterministic, accelerated first-order method to achieve oracle complexity bound when minimizing sufficiently smooth, non-convex functions; this is the best-known bound for first-order methods on smooth non-convex objectives. Unlike existing methods possessing this rate that require a priori knowledge of smoothness constants, we use an adaptive backtracking scheme and a gradient-based restart mechanism to estimate local curvature. This yields a practical algorithm that matches best-known theoretical rates. Empirically, PF-AGD outperforms the practical variant of AGD-Until-Guilty (Carmon et al., 2017), as well as other parameter-free variants, and is a viable alternative to nonlinear conjugate gradient methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
