Towards Noise-adaptive, Problem-adaptive (Accelerated) Stochastic Gradient Descent
Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad

TL;DR
This paper develops noise- and problem-adaptive stochastic gradient methods, including accelerated variants, that achieve near-optimal convergence rates without prior knowledge of noise levels or smoothness, validated through theoretical analysis and experiments.
Contribution
It introduces adaptive SGD algorithms with exponential step-sizes and stochastic line-search that adapt to noise and problem constants, achieving improved convergence rates.
Findings
SGD with exponential step-sizes achieves near-optimal convergence rates.
Stochastic line-search enables adaptation to smoothness, converging to a neighborhood of the solution.
Accelerated SGD (ASGD) attains faster rates without knowledge of noise variance.
Abstract
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number , we prove that iterations of SGD with exponentially decreasing step-sizes and knowledge of the smoothness can achieve an rate, without knowing . In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD with SLS converges at the desired rate, but only to a neighbourhood of the solution. On the other hand, we prove that SGD with an offline estimate of the smoothness converges to the minimizer. However, its rate is slowed down proportional to the estimation error. Next, we prove that SGD with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
MethodsStochastic Gradient Descent
