Continuized Nesterov Momentum Achieves the $O(\varepsilon^{-7/4})$ Complexity without Additional Mechanisms

Julien Hermant; Jean-Fran\c{c}ois Aujol; Charles Dossal; Lorick Huang; Aude Rondepierre

arXiv:2602.05504·math.OC·February 6, 2026

Continuized Nesterov Momentum Achieves the $O(\varepsilon^{-7/4})$ Complexity without Additional Mechanisms

Julien Hermant, Jean-Fran\c{c}ois Aujol, Charles Dossal, Lorick Huang, Aude Rondepierre

PDF

Open Access

TL;DR

This paper demonstrates that a continuized Nesterov momentum algorithm with stochastic parameters can achieve the optimal $O( ext{ε}^{-7/4})$ complexity for non-convex optimization without additional safeguard mechanisms, simplifying existing methods.

Contribution

The authors show that Nesterov momentum with stochastic parameters alone attains optimal complexity, removing the need for safeguard mechanisms in non-convex optimization.

Findings

01

Achieves $O( ext{ε}^{-7/4})$ complexity without safeguards

02

Empirical results show the stochastic limitations are mild

03

Validates the continuized method's effectiveness in practice

Abstract

For first-order optimization of non-convex functions with Lipschitz continuous gradient and Hessian, the best known complexity for reaching an $ε$ -approximation of a stationary point is $O (ε^{- 7/4})$ . Existing algorithms achieving this bound are based on momentum, but are always complemented with safeguard mechanisms, such as restarts or negative-curvature exploitation steps. Whether such mechanisms are fundamentally necessary has remained an open question. Leveraging the continuized method, we show that a Nesterov momentum algorithm with stochastic parameters alone achieves the same complexity in expectation. This result holds up to a multiplicative stochastic factor with unit expectation and a restriction to a subset of the realizations, both of which are independent of the objective function. We empirically verify that these constitute mild limitations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods