When to restart? Exploring escalating restarts on convergence

Ayush K. Varshney; \v{S}ar\=unas Girdzijauskas; Konstantinos Vandikas; Aneta Vulgarakis Feljan

arXiv:2603.04117·cs.LG·March 5, 2026

When to restart? Exploring escalating restarts on convergence

Ayush K. Varshney, \v{S}ar\=unas Girdzijauskas, Konstantinos Vandikas, Aneta Vulgarakis Feljan

PDF

Open Access

TL;DR

This paper introduces SGD-ER, an adaptive learning rate restart strategy that triggers escalations based on training stagnation, improving convergence and test accuracy across multiple datasets and architectures.

Contribution

The paper proposes a novel convergence-aware restart method, SGD-ER, which adaptively escalates learning rates upon stagnation to escape local minima.

Findings

01

SGD-ER improves test accuracy by 0.5-4.5% over standard schedulers.

02

SGD-ER effectively escapes sharp local minima.

03

The method is validated on CIFAR-10, CIFAR-100, and TinyImageNet with various architectures.

Abstract

Learning rate scheduling plays a critical role in the optimization of deep neural networks, directly influencing convergence speed, stability, and generalization. While existing schedulers such as cosine annealing, cyclical learning rates, and warm restarts have shown promise, they often rely on fixed or periodic triggers that are agnostic to the training dynamics, such as stagnation or convergence behavior. In this work, we propose a simple yet effective strategy, which we call Stochastic Gradient Descent with Escalating Restarts (SGD-ER). It adaptively increases the learning rate upon convergence. Our method monitors training progress and triggers restarts when stagnation is detected, linearly escalating the learning rate to escape sharp local minima and explore flatter regions of the loss landscape. We evaluate SGD-ER across CIFAR-10, CIFAR-100, and TinyImageNet on a range of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques