Continuized Nesterov Momentum Achieves the $O(\varepsilon^{-7/4})$ Complexity without Additional Mechanisms
Julien Hermant, Jean-Fran\c{c}ois Aujol, Charles Dossal, Lorick Huang, Aude Rondepierre

TL;DR
This paper demonstrates that a continuized Nesterov momentum algorithm with stochastic parameters can achieve the optimal $O( ext{ε}^{-7/4})$ complexity for non-convex optimization without additional safeguard mechanisms, simplifying existing methods.
Contribution
The authors show that Nesterov momentum with stochastic parameters alone attains optimal complexity, removing the need for safeguard mechanisms in non-convex optimization.
Findings
Achieves $O( ext{ε}^{-7/4})$ complexity without safeguards
Empirical results show the stochastic limitations are mild
Validates the continuized method's effectiveness in practice
Abstract
For first-order optimization of non-convex functions with Lipschitz continuous gradient and Hessian, the best known complexity for reaching an -approximation of a stationary point is . Existing algorithms achieving this bound are based on momentum, but are always complemented with safeguard mechanisms, such as restarts or negative-curvature exploitation steps. Whether such mechanisms are fundamentally necessary has remained an open question. Leveraging the continuized method, we show that a Nesterov momentum algorithm with stochastic parameters alone achieves the same complexity in expectation. This result holds up to a multiplicative stochastic factor with unit expectation and a restriction to a subset of the realizations, both of which are independent of the objective function. We empirically verify that these constitute mild limitations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
