Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement

Stefan Perko

arXiv:2512.04703·cs.LG·December 5, 2025

Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement

Stefan Perko

PDF

Open Access

TL;DR

This paper introduces a novel continuous-time approximation for stochastic gradient descent without replacement (SGDo), providing theoretical convergence guarantees and insights into its learning dynamics.

Contribution

It proposes a stochastic continuous-time model for SGDo using Young differential equations driven by epoched Brownian motion, with proven convergence for strongly convex objectives.

Findings

01

Proves almost sure convergence of the approximation for certain learning rates.

02

Derives an upper bound on the asymptotic convergence rate.

03

Shows the approximation's convergence rate matches or exceeds previous SGDo results.

Abstract

Gradient optimization algorithms using epochs, that is those based on stochastic gradient descent without replacement (SGDo), are predominantly used to train machine learning models in practice. However, the mathematical theory of SGDo and related algorithms remain underexplored compared to their "with replacement" and "one-pass" counterparts. In this article, we propose a stochastic, continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a stochastic process we call an "epoched Brownian motion". We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_{t} = \frac{1}{( 1 + t ) ^{β}}, β \in (0, 1)$ . Moreover, we compute an upper bound on the asymptotic rate of almost sure convergence, which is as good or better than previous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Markov Chains and Monte Carlo Methods