Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
Naoki Yoshida, Shogo Nakakita, Masaaki Imaizumi

TL;DR
This paper introduces Poisson SGD, a variant with a random learning rate, and proves its convergence to a stationary distribution in non-convex optimization, showing it can find global minima and analyzing its generalization error.
Contribution
The paper presents a novel SGD variant with degenerated update directions that uses a random learning rate and establishes its convergence to a stationary distribution in non-convex settings.
Findings
Poisson SGD converges to a stationary distribution under weak assumptions.
Poisson SGD can find global minima in non-convex problems.
The stationary distribution is approximated using PDMP and BPS techniques.
Abstract
We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its theoretically favorable variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsStochastic Gradient Descent
