Stochastic gradient descent algorithms for strongly convex functions at   O(1/T) convergence rates

Shenghuo Zhu

arXiv:1305.2218·cs.LG·May 13, 2013

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Shenghuo Zhu

PDF

Open Access

TL;DR

This paper demonstrates that both traditional and accelerated stochastic gradient descent algorithms can achieve an O(1/T) convergence rate for strongly convex functions, improving upon previous rates.

Contribution

It proves that a weighted SGD with t-proportional weighting and an accelerated SGD both attain an optimal O(1/T) convergence rate for strongly convex functions.

Findings

01

Weighted SGD achieves O(κ/T) convergence rate with high probability.

02

Accelerated SGD also attains O(κ/T) convergence rate.

03

Improves understanding of convergence rates for strongly convex optimization.

Abstract

With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O({\kappa}/T) for strongly convex functions, instead of O({\kappa} ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O({\kappa}/T).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent