WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu, Rachel Ward, L\'eon Bottou

TL;DR
WNGrad introduces a nonlinear learning rate update rule inspired by batch normalization, enabling adaptive, robust learning rate adjustment in gradient descent without prior knowledge of loss function parameters, achieving near-optimal convergence.
Contribution
The paper proposes a novel nonlinear learning rate update rule that adapts during training, inspired by batch normalization, providing robustness and near-optimal convergence in various settings.
Findings
Achieves robustness to Lipschitz constant variations.
Attains near-optimal convergence rates in batch and stochastic settings.
Extends robustness to nonconvex and non-smooth deep learning problems.
Abstract
Adjusting the learning rate schedule in stochastic gradient methods is an important unresolved problem which requires tuning in practice. If certain parameters of the loss function such as smoothness or strong convexity constants are known, theoretical learning rate schedules can be applied. However, in practice, such parameters are not known, and the loss function of interest is not convex in any case. The recently proposed batch normalization reparametrization is widely adopted in most neural network architectures today because, among other advantages, it is robust to the choice of Lipschitz constant of the gradient in loss function, allowing one to set a large learning rate without worry. Inspired by batch normalization, we propose a general nonlinear update rule for the learning rate in batch and stochastic gradient descent so that the learning rate can be initialized at a high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsBatch Normalization
