Unified Optimal Analysis of the (Stochastic) Gradient Method

Sebastian U. Stich

arXiv:1907.04232·cs.LG·December 24, 2019·55 cites

Unified Optimal Analysis of the (Stochastic) Gradient Method

Sebastian U. Stich

PDF

Open Access

TL;DR

This paper provides a unified analysis of stochastic and deterministic gradient methods, establishing convergence rates under mild smoothness assumptions and matching the best known iteration complexities.

Contribution

It offers a simple proof for convergence of SGD under milder smoothness conditions and unifies the analysis for both stochastic and deterministic gradient methods.

Findings

01

Convergence rate for SGD with $ ilde{O}(LR^2 e^{-rac{}{4L}T} + rac{\sigma^2}{ T})$.

02

Recovery of exponential convergence in the interpolation setting where $\sigma^2=0$.

03

Match with best known iteration complexity bounds for GD and SGD.

Abstract

In this note we give a simple proof for the convergence of stochastic gradient (SGD) methods on $μ$ -convex functions under a (milder than standard) $L$ -smoothness assumption. We show that for carefully chosen stepsizes SGD converges after $T$ iterations as $O (L R^{2} exp [- \frac{μ}{4 L} T] + \frac{σ ^{2}}{μ T})$ where $σ^{2}$ measures the variance in the stochastic noise. For deterministic gradient descent (GD) and SGD in the interpolation setting we have $σ^{2} = 0$ and we recover the exponential convergence rate. The bound matches with the best known iteration complexity of GD and SGD, up to constants.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Numerical methods in inverse problems

MethodsStochastic Gradient Descent