Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Naoki Yoshida; Shogo Nakakita; Masaaki Imaizumi

arXiv:2406.16032·stat.ML·September 9, 2025

Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

Naoki Yoshida, Shogo Nakakita, Masaaki Imaizumi

PDF

Open Access

TL;DR

This paper introduces Poisson SGD, a variant with a random learning rate, and proves its convergence to a stationary distribution in non-convex optimization, showing it can find global minima and analyzing its generalization error.

Contribution

The paper presents a novel SGD variant with degenerated update directions that uses a random learning rate and establishes its convergence to a stationary distribution in non-convex settings.

Findings

01

Poisson SGD converges to a stationary distribution under weak assumptions.

02

Poisson SGD can find global minima in non-convex problems.

03

The stationary distribution is approximated using PDMP and BPS techniques.

Abstract

We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous studies reveal the convergence properties of SGD and its theoretically favorable variants. Among these, the analysis of convergence using a stationary distribution of updated parameters provides generalizable results. However, to obtain a stationary distribution, the update direction of the parameters must not degenerate, which limits the applicable variants of SGD. In this study, we consider a novel SGD variant, Poisson SGD, which has degenerated parameter update directions and instead utilizes a random learning rate. Consequently, we demonstrate that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsStochastic Gradient Descent