Convergence of Stochastic Gradient Descent with mini-batching and infinite variance

Bartosz Glowacki; Rafal Kulik; Philippe Soulier

arXiv:2605.07184·math.PR·May 11, 2026

Convergence of Stochastic Gradient Descent with mini-batching and infinite variance

Bartosz Glowacki, Rafal Kulik, Philippe Soulier

PDF

TL;DR

This paper analyzes how mini-batched stochastic gradient descent behaves under heavy-tailed gradient noise, establishing convergence rates and distributional limits when noise follows an alpha-stable law.

Contribution

It provides new theoretical insights into SGD with increasing batch sizes under heavy-tailed noise, including convergence bounds and limit distributions.

Findings

01

Increasing batch sizes accelerate convergence.

02

SGD with batching converges in probability with a constant stepsize.

03

Normalized SGD iterates converge to an alpha-stable Levy-driven Ornstein-Uhlenbeck process.

Abstract

Stochastic gradient descent (SGD) with mini-batching is a standard tool in large-scale optimization, yet its theoretical properties under heavy-tailed gradient noise remain largely unexplored. In this paper we study SGD with increasing batch sizes when the gradient noise belongs to the domain of attraction of an $α$ -stable law with $α \in (1, 2)$ . Building on existing results for the finite-variance regime and for heavy-tailed SGD without batching, we establish three main results. First, we derive $L^{p}$ moment bounds for the SGD error and show that increasing batch sizes lead to faster convergence rates. In particular, batching enables convergence in probability even for a constant stepsize. Second, we prove that the properly normalized SGD iterates converge in distribution to the stationary law of an Ornstein-Uhlenbeck process driven by an $α$ -stable L\'evy process. Third,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.