Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir, Tong Zhang

TL;DR
This paper analyzes the convergence of stochastic gradient descent (SGD) for non-smooth convex and strongly convex functions without smoothness assumptions, introducing optimal averaging schemes and providing the first bounds of their kind.
Contribution
It establishes new convergence bounds for SGD on non-smooth functions and proposes a simple, effective averaging scheme that achieves optimal rates.
Findings
Last iterate suboptimality scales as O(log(T)/√T) for convex functions.
Suboptimality scales as O(log(T)/T) for strongly convex functions.
Proposed averaging scheme attains optimal convergence rates and is easy to implement.
Abstract
Stochastic Gradient Descent (SGD) is one of the simplest and most popular stochastic optimization methods. While it has already been theoretically studied for decades, the classical analysis usually required non-trivial smoothness assumptions, which do not apply to many modern applications of SGD with non-smooth objective functions such as support vector machines. In this paper, we investigate the performance of SGD without such smoothness assumptions, as well as a running average scheme to convert the SGD iterates to a solution with optimal optimization accuracy. In this framework, we prove that after T rounds, the suboptimality of the last SGD iterate scales as O(log(T)/\sqrt{T}) for non-smooth convex objective functions, and O(log(T)/T) in the non-smooth strongly convex case. To the best of our knowledge, these are the first bounds of this kind, and almost match the minimax-optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
