Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Alexander Rakhlin, Ohad Shamir, Karthik Sridharan

TL;DR
This paper investigates the optimal convergence rates of stochastic gradient descent (SGD) for strongly convex problems, showing that simple modifications can achieve the optimal O(1/T) rate even for non-smooth cases.
Contribution
It demonstrates that a simple averaging modification in SGD attains the optimal convergence rate for both smooth and non-smooth strongly convex problems.
Findings
SGD attains O(1/T) rate for smooth problems.
Non-smooth problems may have a (0 ext{log}(T)/T) rate with standard averaging.
A simple averaging modification recovers the optimal O(1/T) rate.
Abstract
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be \Omega(\log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
