A Variational Analysis of Stochastic Gradient Algorithms
Stephan Mandt, Matthew D. Hoffman, and David M. Blei

TL;DR
This paper presents a theoretical framework interpreting stochastic gradient descent as a variational inference method, enabling it to approximate posterior distributions by tuning its parameters based on stochastic process analysis.
Contribution
It introduces a novel interpretation of SGD as a continuous-time stochastic process for variational inference, deriving optimal parameters to match the posterior distribution.
Findings
SGD with constant rates can approximate posterior distributions effectively.
Theoretical connection between SGD and Ornstein-Uhlenbeck processes.
Guidelines for tuning SGD parameters for probabilistic modeling.
Abstract
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
