Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Marcel Hudiani

arXiv:2507.07281·math.OC·March 11, 2026

Convergence Rate for the Last Iterate of Stochastic Gradient Descent Schemes

Marcel Hudiani

PDF

Open Access

TL;DR

This paper analyzes the convergence rates of the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) methods for convex and non-convex functions with Hölder continuous gradients, using discrete Gronwall's inequality.

Contribution

It provides new convergence rate results for SGD and SHB without relying on Robbins-Siegmund theorem, including probabilistic bounds for convex functions with constant momentum.

Findings

01

SGD and SHB achieve specific convergence rates for non-convex objectives.

02

SHB with constant momentum attains a logarithmic convergence rate in probability for convex functions.

03

The paper recovers known results and extends them to broader settings with Hölder continuous gradients.

Abstract

We study the convergence rate for the last iterate of stochastic gradient descent (SGD) and stochastic heavy ball (SHB) in the parametric setting when the objective function $F$ is globally convex or non-convex whose gradient is $γ$ -H\"{o}lder. Using only discrete Gronwall's inequality without Robbins-Siegmund theorem, we recover results for both SGD and SHB: $min_{s \leq t} ∥\nabla F (w_{s}) ∥^{2} = o (t^{p - 1})$ for non-convex objectives and $F (w_{τ \land t}) - F_{*} = o (t^{2 γ / (1 + γ) \cdot m a x (p - 1, - 2 p + 1) - ϵ})$ for $β \in (0, 1)$ , $τ := in f {t > 0 : F (w_{t}) = F_{*}}$ , and $min_{s \leq t} F (w_{s}) - F_{*} = o (t^{p - 1})$ for convex objectives $F$ whose minimum is $F_{*}$ . In addition, we proved that SHB with constant momentum parameter $β \in (0, 1)$ attains a convergence rate of $F (w_{t}) - F_{*} = O (t^{m a x (p - 1, - 2 p + 1)} lo g^{2} \frac{t}{δ})$ with probability at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Stochastic processes and financial applications