Rethinking Neural Network Learning Rates: A Stackelberg Perspective

Sihan Zeng; Sujay Bhatt; Sumitra Ganesh

arXiv:2605.15530·cs.LG·May 18, 2026

Rethinking Neural Network Learning Rates: A Stackelberg Perspective

Sihan Zeng, Sujay Bhatt, Sumitra Ganesh

PDF

TL;DR

This paper offers a Stackelberg optimization perspective on neural network training, revealing how non-uniform learning rates can accelerate convergence and improve performance by leveraging problem structure and curvature differences.

Contribution

It introduces a Stackelberg reformulation of neural network training, providing convergence guarantees and explaining when and why layer-specific learning rates are beneficial.

Findings

01

Non-uniform learning rates can induce a stronger optimization structure.

02

Stackelberg objective exhibits sharper local curvature early in training.

03

Experiments confirm improved training speed and performance.

Abstract

Neural networks are typically trained with a single learning rate across all layers. While recent empirical evidence suggests that assigning layer-specific learning rates can accelerate training, a principled understanding of the conditions and mechanisms under which non-uniform learning rates are beneficial remains limited. In this work, we investigate non-uniform learning rates through the lens of Stackelberg optimization. Specifically, we demonstrate that training neural networks with a smaller learning rate for the body layers and a larger learning rate for the final layer can be interpreted as a two-time-scale alternating gradient descent algorithm applied to a Stackelberg reformulation of the original objective. We establish finite-time convergence guarantees for the algorithm under broad conditions that accommodate constraint sets and non-smooth activation functions. Beyond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.