Mixing of Stochastic Accelerated Gradient Descent
Peiyuan Zhang, Hadi Daneshmand, Thomas Hofmann

TL;DR
This paper analyzes the mixing properties of stochastic accelerated gradient descent (SAGD) for least-squares regression, showing it mixes faster than SGD and establishing explicit mixing rates based on data moments.
Contribution
It proves SAGD's chain is geometrically ergodic with explicit mixing rates, and compares its mixing speed to SGD, supported by a novel non-asymptotic matrix product analysis.
Findings
SAGD and SGD simulate the same invariant distribution.
SAGD mixes faster than SGD under certain conditions.
Explicit mixing rates depend on the first four moments of data.
Abstract
We study the mixing properties for stochastic accelerated gradient descent (SAGD) on least-squares regression. First, we show that stochastic gradient descent (SGD) and SAGD are simulating the same invariant distribution. Motivated by this, we then establish mixing rate for SAGD-iterates and compare it with those of SGD-iterates. Theoretically, we prove that the chain of SAGD iterates is geometrically ergodic --using a proper choice of parameters and under regularity assumptions on the input distribution. More specifically, we derive an explicit mixing rate depending on the first 4 moments of the data distribution. By means of illustrative examples, we prove that SAGD-iterate chain mixes faster than the chain of iterates obtained by SGD. Furthermore, we highlight applications of the established mixing rate in the convergence analysis of SAGD on realizable objectives. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
