Where Do Large Learning Rates Lead Us?

Ildus Sadrtdinov; Maxim Kodryan; Eduard Pokonechny; Ekaterina; Lobacheva; Dmitry Vetrov

arXiv:2410.22113·cs.LG·October 30, 2024

Where Do Large Learning Rates Lead Us?

Ildus Sadrtdinov, Maxim Kodryan, Eduard Pokonechny, Ekaterina, Lobacheva, Dmitry Vetrov

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper empirically investigates how the choice of large initial learning rates affects neural network training, revealing that only a narrow optimal range leads to high-quality, well-generalized minima with focused features.

Contribution

It identifies the optimal initial learning rate range for training, linking it to basin geometry and feature sparsity, advancing understanding of learning rate effects.

Findings

01

Optimal initial LRs are narrowly above the convergence threshold.

02

Using optimal LRs leads to high-quality minima with focused features.

03

Too small or too large LRs result in poor generalization and unstable minima.

Abstract

It is generally accepted that starting neural networks training with large learning rates (LRs) improves generalization. Following a line of research devoted to understanding this effect, we conduct an empirical study in a controlled setting focusing on two questions: 1) how large an initial LR is required for obtaining optimal quality, and 2) what are the key differences between models trained with different LRs? We discover that only a narrow range of initial LRs slightly above the convergence threshold lead to optimal results after fine-tuning with a small LR or weight averaging. By studying the local geometry of reached minima, we observe that using LRs from this optimal range allows for the optimization to locate a basin that only contains high-quality minima. Additionally, we show that these initial LRs result in a sparse set of learned features, with a clear focus on those most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

isadrtdinov/understanding-large-lrs
pytorchOfficial

Videos

Where Do Large Learning Rates Lead Us?· slideslive

Taxonomy

TopicsOnline Learning and Analytics

MethodsSparse Evolutionary Training · Focus