Sharper Rates and Flexible Framework for Nonconvex SGD with Client and   Data Sampling

Alexander Tyurin; Lukang Sun; Konstantin Burlachenko; Peter; Richt\'arik

arXiv:2206.02275·cs.LG·June 7, 2022

Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling

Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter, Richt\'arik

PDF

Open Access 1 Repo

TL;DR

This paper improves the theoretical understanding of stochastic gradient descent for nonconvex optimization by generalizing sampling mechanisms, explicitly incorporating smoothness constants, and providing sharper bounds relevant for federated learning.

Contribution

It generalizes the PAGE algorithm to work with arbitrary unbiased sampling, explicitly accounts for smoothness constants, and offers sharper analysis and bounds for nonconvex SGD.

Findings

01

Generalized PAGE for flexible sampling

02

Explicitly incorporated smoothness constants

03

Sharper bounds on convergence rates

Abstract

We revisit the classical problem of finding an approximately stationary point of the average of $n$ smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is $O (n + n^{1/2} ε^{- 1})$ , attained by the optimal SGD methods $SPIDER$ (arXiv:1807.01695) and $PAGE$ (arXiv:2008.10898), for example, where $ε$ is the error tolerance. However, i) the big- $O$ notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the $PAGE$ algorithm so that it can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mysteryresearcher/sampling-in-optimal-sgd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs

MethodsStochastic Gradient Descent