Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
Shenghuo Zhu

TL;DR
This paper demonstrates that both traditional and accelerated stochastic gradient descent algorithms can achieve an O(1/T) convergence rate for strongly convex functions, improving upon previous rates.
Contribution
It proves that a weighted SGD with t-proportional weighting and an accelerated SGD both attain an optimal O(1/T) convergence rate for strongly convex functions.
Findings
Weighted SGD achieves O(κ/T) convergence rate with high probability.
Accelerated SGD also attains O(κ/T) convergence rate.
Improves understanding of convergence rates for strongly convex optimization.
Abstract
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O({\kappa}/T) for strongly convex functions, instead of O({\kappa} ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O({\kappa}/T).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
