Convergence Rates for Deterministic and Stochastic Subgradient Methods   Without Lipschitz Continuity

Benjamin Grimmer

arXiv:1712.04104·math.OC·February 28, 2018

Convergence Rates for Deterministic and Stochastic Subgradient Methods Without Lipschitz Continuity

Benjamin Grimmer

PDF

TL;DR

This paper extends convergence rate theory for subgradient methods to non-Lipschitz convex functions, providing new rates for deterministic and stochastic algorithms under broader conditions.

Contribution

It introduces convergence rates for subgradient methods applied to non-Lipschitz convex functions, generalizing classic results to broader function classes.

Findings

01

Deterministic subgradient method achieves $O(1/\sqrt{T})$ convergence for locally Lipschitz convex functions.

02

Stochastic subgradient method attains $O(1/\sqrt{T})$ convergence for functions with quadratic growth.

03

Rates improve to $O(1/T)$ under strong convexity or quadratic lower bounds.

Abstract

We extend the classic convergence rate theory for subgradient methods to apply to non-Lipschitz functions. For the deterministic projected subgradient method, we present a global $O (1/ T)$ convergence rate for any convex function which is locally Lipschitz around its minimizers. This approach is based on Shor's classic subgradient analysis and implies generalizations of the standard convergence rates for gradient descent on functions with Lipschitz or H\"older continuous gradients. Further, we show a $O (1/ T)$ convergence rate for the stochastic projected subgradient method on convex functions with at most quadratic growth, which improves to $O (1/ T)$ under either strong convexity or a weaker quadratic lower bound condition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.