Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Sharan Vaswani; Benjamin Dubois-Taine; Reza Babanezhad

arXiv:2110.11442·math.OC·March 24, 2026·1 cites

Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent

Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad

PDF

Open Access

TL;DR

This paper develops noise- and problem-adaptive stochastic gradient methods, including accelerated variants, that achieve near-optimal convergence rates without prior knowledge of noise levels or smoothness, validated through theoretical analysis and experiments.

Contribution

It introduces adaptive SGD algorithms with exponential step-sizes and stochastic line-search that adapt to noise and problem constants, achieving improved convergence rates.

Findings

01

SGD with exponential step-sizes achieves near-optimal convergence rates.

02

Stochastic line-search enables adaptation to smoothness, converging to a neighborhood of the solution.

03

Accelerated SGD (ASGD) attains faster rates without knowledge of noise variance.

Abstract

We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $σ^{2}$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $κ$ , we prove that $T$ iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an $\tilde{O} (exp (\frac{- T}{κ}) + \frac{σ ^{2}}{T})$ rate, without knowing $σ^{2}$ . In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods

MethodsStochastic Gradient Descent