The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization
W. Tao, Z. Pan, G. Wu, and Q. Tao

TL;DR
This paper demonstrates that Nesterov's extrapolation can optimize individual convergence in nonsmooth convex optimization, offering a new approach that improves convergence rates and sparsity in machine learning tasks.
Contribution
The paper proves Nesterov's extrapolation achieves optimal individual convergence for nonsmooth problems and extends algorithms to stochastic settings with better sparsity and convergence guarantees.
Findings
Nesterov's extrapolation enhances individual convergence in nonsmooth convex optimization.
Modified subgradient methods achieve optimal convergence rates.
Algorithms outperform existing methods in sparsity and convergence in machine learning tasks.
Abstract
The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this article, the convergence of individual iterates of projected subgradient (PSG) methods for nonsmooth convex optimization problems is theoretically studied based on Nesterov's extrapolation, which we name individual convergence. We prove that Nesterov's extrapolation has the strength to make the individual convergence of PSG optimal for nonsmooth problems. In light of this consideration, a direct modification of the subgradient evaluation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step toward the open question about stochastic gradient descent (SGD)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent
