Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling
Alexander Tyurin, Lukang Sun, Konstantin Burlachenko, Peter, Richt\'arik

TL;DR
This paper improves the theoretical understanding of stochastic gradient descent for nonconvex optimization by generalizing sampling mechanisms, explicitly incorporating smoothness constants, and providing sharper bounds relevant for federated learning.
Contribution
It generalizes the PAGE algorithm to work with arbitrary unbiased sampling, explicitly accounts for smoothness constants, and offers sharper analysis and bounds for nonconvex SGD.
Findings
Generalized PAGE for flexible sampling
Explicitly incorporated smoothness constants
Sharper bounds on convergence rates
Abstract
We revisit the classical problem of finding an approximately stationary point of the average of smooth and possibly nonconvex functions. The optimal complexity of stochastic first-order methods in terms of the number of gradient evaluations of individual functions is , attained by the optimal SGD methods (arXiv:1807.01695) and (arXiv:2008.10898), for example, where is the error tolerance. However, i) the big- notation hides crucial dependencies on the smoothness constants associated with the functions, and ii) the rates and theory in these methods assume simplistic sampling mechanisms that do not offer any flexibility. In this work we remedy the situation. First, we generalize the algorithm so that it can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Complexity and Algorithms in Graphs
MethodsStochastic Gradient Descent
