Randomness Helps Rigor: A Probabilistic Learning Rate Scheduler Bridging Theory and Deep Learning Practice
Dahlia Devapriya, Thulasi Tholeti, Janani Suresh, Sheetal Kalyani

TL;DR
This paper introduces a probabilistic learning rate scheduler that allows non-monotonic rates and provides theoretical convergence guarantees, demonstrating improved performance and stability in training deep neural networks.
Contribution
We propose a novel probabilistic learning rate scheduler with proven convergence, bridging the gap between theory and practice in deep learning training.
Findings
PLRS matches or exceeds state-of-the-art schedulers in accuracy.
PLRS shows more stable convergence on complex datasets.
Outperforms existing schedulers in specific neural network training scenarios.
Abstract
Learning rate schedulers have shown great success in speeding up the convergence of learning algorithms in practice. However, their convergence to a minimum has not been proven theoretically. This difficulty mainly arises from the fact that, while traditional convergence analysis prescribes to monotonically decreasing (or constant) learning rates, schedulers opt for rates that often increase and decrease through the training epochs. In this work, we aim to bridge the gap by proposing a probabilistic learning rate scheduler (PLRS) that does not conform to the monotonically decreasing condition, with provable convergence guarantees. To cement the relevance and utility of our work in modern day applications, we show experimental results on deep neural network architectures such as ResNet, WRN, VGG, and DenseNet on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. We show that PLRS performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnalog and Mixed-Signal Circuit Design
MethodsConcatenated Skip Connection · Batch Normalization · Global Average Pooling · 1x1 Convolution · Dropout · Kaiming Initialization · Dense Connections · Dense Block · Softmax · Max Pooling
