OFFO minimization algorithms for second-order optimality and their complexity
S. Gratton, Ph. L. Toint

TL;DR
This paper introduces a class of Adagrad-inspired algorithms for smooth unconstrained optimization that achieve near-optimal convergence rates for gradient norms and second-order optimality measures without evaluating the objective function.
Contribution
The paper proposes a novel Adagrad-inspired algorithm that minimizes function evaluations while maintaining optimal convergence rates for second-order optimality measures.
Findings
Gradient norms decrease as O(1/√k)
Second-order optimality measures converge as O(1/k^{1/3})
Related divergent stepsize method has slightly inferior convergence
Abstract
An Adagrad-inspired class of algorithms for smooth unconstrained optimization is presented in which the objective function is never evaluated and yet the gradient norms decrease at least as fast as while second-order optimality measures converge to zero at least as fast as . This latter rate of convergence is shown to be essentially sharp and is identical to that known for more standard algorithms (like trust-region or adaptive-regularization methods) using both function and derivatives' evaluations. A related "divergent stepsize" method is also described, whose essentially sharp rate of convergence is slighly inferior. It is finally discussed how to obtain weaker second-order optimality guarantees at a (much) reduced computional cost.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
