Making Gradient Descent Optimal for Strongly Convex Stochastic   Optimization

Alexander Rakhlin; Ohad Shamir; Karthik Sridharan

arXiv:1109.5647·cs.LG·March 19, 2015·548 cites

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Alexander Rakhlin, Ohad Shamir, Karthik Sridharan

PDF

Open Access

TL;DR

This paper investigates the optimal convergence rates of stochastic gradient descent (SGD) for strongly convex problems, showing that simple modifications can achieve the optimal O(1/T) rate even for non-smooth cases.

Contribution

It demonstrates that a simple averaging modification in SGD attains the optimal convergence rate for both smooth and non-smooth strongly convex problems.

Findings

01

SGD attains O(1/T) rate for smooth problems.

02

Non-smooth problems may have a (0 ext{log}(T)/T) rate with standard averaging.

03

A simple averaging modification recovers the optimal O(1/T) rate.

Abstract

Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization problems which arise in machine learning. For strongly convex problems, its convergence rate was known to be O(\log(T)/T), by running SGD for T iterations and returning the average point. However, recent results showed that using a different algorithm, one can get an optimal O(1/T) rate. This might lead one to believe that standard SGD is suboptimal, and maybe should even be replaced as a method of choice. In this paper, we investigate the optimality of SGD in a stochastic setting. We show that for smooth problems, the algorithm attains the optimal O(1/T) rate. However, for non-smooth problems, the convergence rate with averaging might really be \Omega(\log(T)/T), and this is not just an artifact of the analysis. On the flip side, we show that a simple modification of the averaging step…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research