The Strength of Nesterov's Extrapolation in the Individual Convergence   of Nonsmooth Optimization

W. Tao; Z. Pan; G. Wu; and Q. Tao

arXiv:2006.04340·cs.LG·June 18, 2020

The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

W. Tao, Z. Pan, G. Wu, and Q. Tao

PDF

TL;DR

This paper demonstrates that Nesterov's extrapolation can optimize individual convergence in nonsmooth convex optimization, offering a new approach that improves convergence rates and sparsity in machine learning tasks.

Contribution

The paper proves Nesterov's extrapolation achieves optimal individual convergence for nonsmooth problems and extends algorithms to stochastic settings with better sparsity and convergence guarantees.

Findings

01

Nesterov's extrapolation enhances individual convergence in nonsmooth convex optimization.

02

Modified subgradient methods achieve optimal convergence rates.

03

Algorithms outperform existing methods in sparsity and convergence in machine learning tasks.

Abstract

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this article, the convergence of individual iterates of projected subgradient (PSG) methods for nonsmooth convex optimization problems is theoretically studied based on Nesterov's extrapolation, which we name individual convergence. We prove that Nesterov's extrapolation has the strength to make the individual convergence of PSG optimal for nonsmooth problems. In light of this consideration, a direct modification of the subgradient evaluation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step toward the open question about stochastic gradient descent (SGD)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent