Tight Analyses for Non-Smooth Stochastic Gradient Descent
Nicholas J. A. Harvey, Christopher Liaw, Yaniv Plan, Sikander Randhawa

TL;DR
This paper provides tight high-probability bounds for the convergence rates of stochastic gradient descent on non-smooth, strongly convex functions, showing that the last iterate's error matches deterministic rates and improving existing expectation-based bounds.
Contribution
It establishes tight high-probability convergence bounds for stochastic gradient descent on non-smooth functions, resolving open questions and improving upon prior expectation-based results.
Findings
Final iterate error is O(log(T)/T) for strongly convex functions.
Suffix averaging achieves optimal O(1/T) error with high probability.
Results extend to Lipschitz convex functions with error O(log(T)/√T.
Abstract
Consider the problem of minimizing functions that are Lipschitz and strongly convex, but not necessarily differentiable. We prove that after steps of stochastic gradient descent, the error of the final iterate is with high probability. We also construct a function from this class for which the error of the final iterate of deterministic gradient descent is . This shows that the upper bound is tight and that, in this setting, the last iterate of stochastic gradient descent has the same general error rate (with high probability) as deterministic gradient descent. This resolves both open questions posed by Shamir (2012). An intermediate step of our analysis proves that the suffix averaging method achieves error with high probability, which is optimal (for any first-order optimization method). This improves results of Rakhlin (2012) and Hazan…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Complexity and Algorithms in Graphs
