Nesterov acceleration despite very noisy gradients

Kanan Gupta; Jonathan W. Siegel; Stephan Wojtowytsch

arXiv:2302.05515·stat.ML·November 4, 2024

Nesterov acceleration despite very noisy gradients

Kanan Gupta, Jonathan W. Siegel, Stephan Wojtowytsch

PDF

Open Access 1 Video

TL;DR

This paper introduces AGNES, a generalized Nesterov acceleration method that maintains accelerated convergence rates in convex optimization despite high levels of gradient noise proportional to the gradient magnitude, relevant for machine learning.

Contribution

AGNES extends Nesterov's method to noisy gradients with a simple parameter set, ensuring acceleration under broader noise conditions than existing algorithms.

Findings

01

AGNES achieves acceleration with noisy gradients proportional to the gradient magnitude.

02

The method requires fewer parameters than existing approaches.

03

Provides geometric intuition and heuristics for parameter selection.

Abstract

We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex and strongly convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient at every point. Nesterov's method converges at an accelerated rate if the constant of proportionality is below 1, while AGNES accommodates any signal-to-noise ratio. The noise model is motivated by applications in overparametrized machine learning. AGNES requires only two parameters in convex and three in strongly convex minimization tasks, improving on existing methods. We further provide clear geometric interpretations and heuristics for the choice of parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Nesterov acceleration despite very noisy gradients· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning