AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

Nikola Surjanovic; Alexandre Bouchard-C\^ot\'e; Trevor Campbell

arXiv:2505.21651·cs.LG·May 29, 2025

AutoSGD: Automatic Learning Rate Selection for Stochastic Gradient Descent

Nikola Surjanovic, Alexandre Bouchard-C\^ot\'e, Trevor Campbell

PDF

Open Access

TL;DR

AutoSGD is a novel method that automatically adjusts the learning rate during stochastic gradient descent, reducing the need for manual tuning and improving performance across various tasks.

Contribution

We propose AutoSGD, an automatic learning rate adjustment algorithm with theoretical convergence guarantees for SGD and standard gradient descent.

Findings

01

AutoSGD achieves competitive or superior performance on multiple optimization problems.

02

Theoretical convergence of AutoSGD is established for stochastic and deterministic cases.

03

Empirical results demonstrate reduced tuning effort and improved efficiency.

Abstract

The learning rate is an important tuning parameter for stochastic gradient descent (SGD) and can greatly influence its performance. However, appropriate selection of a learning rate schedule across all iterations typically requires a non-trivial amount of user tuning effort. To address this, we introduce AutoSGD: an SGD method that automatically determines whether to increase or decrease the learning rate at a given iteration and then takes appropriate action. We introduce theory supporting the convergence of AutoSGD, along with its deterministic counterpart for standard gradient descent. Empirical results suggest strong performance of the method on a variety of traditional optimization problems and machine learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsStochastic Gradient Descent