A view of mini-batch SGD via generating functions: conditions of convergence, phase transitions, benefit from negative momenta
Maksim Velikanov, Denis Kuznedelev, Dmitry Yarotsky

TL;DR
This paper introduces a new analytical framework for mini-batch SGD with momentum, revealing conditions for convergence, phase transitions, and the benefits of negative momentum through spectral analysis and generating functions.
Contribution
The paper develops a novel spectral generating function approach to analyze noise-averaged properties of mini-batch SGD with momentum, providing explicit stability conditions and convergence insights.
Findings
SGD dynamics have multiple regimes depending on spectral distribution.
Explicit stability conditions for convergent regimes are derived.
Negative momentum can optimize convergence rates.
Abstract
Mini-batch SGD with momentum is a fundamental algorithm for learning large predictive models. In this paper we develop a new analytic framework to analyze noise-averaged properties of mini-batch SGD for linear models at constant learning rates, momenta and sizes of batches. Our key idea is to consider the dynamics of the second moments of model parameters for a special family of "Spectrally Expressible" approximations. This allows to obtain an explicit expression for the generating function of the sequence of loss values. By analyzing this generating function, we find, in particular, that 1) the SGD dynamics exhibits several convergent and divergent regimes depending on the spectral distributions of the problem; 2) the convergent regimes admit explicit stability conditions, and explicit loss asymptotics in the case of power-law spectral distributions; 3) the optimal convergence rate can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Gaussian Processes and Bayesian Inference · Statistical Methods and Inference
MethodsSGD with Momentum · Stochastic Gradient Descent
