The Speed-Robustness Trade-Off for First-Order Methods with Additive Gradient Noise
Bryan Van Scoy, Laurent Lessard

TL;DR
This paper investigates the fundamental trade-off between convergence speed and robustness to gradient noise in first-order optimization methods, proposing new algorithms with tunable parameters to balance these competing objectives.
Contribution
It introduces a tractable framework to analyze and design first-order methods that explicitly trade off convergence rate and noise sensitivity, extending to accelerated methods like HB and Nesterov's FG.
Findings
New algorithms with tunable parameters effectively balance speed and robustness.
Analytic formulas for convergence and noise sensitivity for broad classes of functions.
Numerical validation shows improved trade-offs over existing methods.
Abstract
We study the trade-off between convergence rate and sensitivity to stochastic additive gradient noise for first-order optimization methods. Ordinary Gradient Descent (GD) can be made fast-and-sensitive or slow-and-robust by increasing or decreasing the stepsize, respectively. However, it is not clear how such a trade-off can be navigated when working with accelerated methods such as Polyak's Heavy Ball (HB) or Nesterov's Fast Gradient (FG) methods. We consider two classes of functions: (1) strongly convex quadratics and (2) smooth strongly convex functions. For each function class, we present a tractable way to compute the convergence rate and sensitivity to additive gradient noise for a broad family of first-order methods, and we present algorithm designs that trade off these competing performance metrics. Each design consists of a simple analytic update rule with two states of memory,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research
