Convergence of Multi-Level Markov Chain Monte Carlo Adaptive Stochastic Gradient Algorithms

Antoine Godichon-Baggioni (LPSM ); Gabriel Lang (MIA Paris-Saclay); Sylvain Le Corff; Julien Stoehr (CEREMADE); Sobihan Surendran

arXiv:2601.22799·math.ST·February 2, 2026

Convergence of Multi-Level Markov Chain Monte Carlo Adaptive Stochastic Gradient Algorithms

Antoine Godichon-Baggioni (LPSM ), Gabriel Lang (MIA Paris-Saclay), Sylvain Le Corff, Julien Stoehr (CEREMADE), Sobihan Surendran

PDF

Open Access

TL;DR

This paper introduces a multilevel Monte Carlo gradient estimator that reduces bias efficiently and integrates it into adaptive stochastic gradient algorithms, improving convergence in complex models like autoencoders.

Contribution

It proposes a novel multilevel MCMC gradient estimator with bias decay and low computational cost, and develops new multilevel adaptive gradient algorithms with proven convergence rates.

Findings

01

Bias decays as O(T_n^{-1}) with logarithmic cost growth

02

New multilevel variants of Adagrad and AMSGrad are developed

03

Convergence rate of O(n^{-1/2}) up to logarithmic factors

Abstract

Stochastic optimization in learning and inference often relies on Markov chain Monte Carlo (MCMC) to approximate gradients when exact computation is intractable. However, finite-time MCMC estimators are biased, and reducing this bias typically comes at a higher computational cost. We propose a multilevel Monte Carlo gradient estimator whose bias decays as $O (T_{n}^{- 1})$ while its expected computational cost grows only as $O (l o g T_{n})$ , where $T_{n}$ is the maximal truncation level at iteration n. Building on this approach, we introduce a multilevel MCMC framework for adaptive stochastic gradient methods, leading to new multilevel variants of Adagrad and AMSGrad algorithms. Under conditions controlling the estimator bias and its second and third moments, we establish a convergence rate of order $O (n^{- 1/2})$ up to logarithmic factors. Finally, we illustrate these results on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference