Fast Convergence of Stochastic Gradient Descent under a Strong Growth   Condition

Mark Schmidt (INRIA Paris - Rocquencourt; LIENS); Nicolas Le Roux; (INRIA Paris - Rocquencourt; LIENS)

arXiv:1308.6370·math.OC·August 30, 2013·86 cites

Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition

Mark Schmidt (INRIA Paris - Rocquencourt, LIENS), Nicolas Le Roux, (INRIA Paris - Rocquencourt, LIENS)

PDF

Open Access

TL;DR

This paper proves that stochastic gradient descent converges quickly under a strong growth condition, achieving an $O(1/k)$ rate generally and linear convergence for strongly convex functions, expanding understanding of its efficiency.

Contribution

It demonstrates that under a specific growth condition, stochastic gradient descent attains fast convergence rates, including linear convergence for strongly convex functions.

Findings

01

SGD has an $O(1/k)$ convergence rate under the growth condition.

02

Linear convergence is achieved if the function is strongly convex.

03

The results extend the theoretical understanding of SGD efficiency.

Abstract

We consider optimizing a function smooth convex function $f$ that is the average of a set of differentiable functions $f_{i}$ , under the assumption considered by Solodov [1998] and Tseng [1998] that the norm of each gradient $f_{i}^{'}$ is bounded by a linear function of the norm of the average gradient $f^{'}$ . We show that under these assumptions the basic stochastic gradient method with a sufficiently-small constant step-size has an $O (1/ k)$ convergence rate, and has a linear convergence rate if $g$ is strongly-convex.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs