Dynamic Learning Rate Decay for Stochastic Variational Inference

Maximilian Dinkel; Gil Robalo Rei; Wolfgang A. Wall

arXiv:2412.15745·cs.CE·December 23, 2024

Dynamic Learning Rate Decay for Stochastic Variational Inference

Maximilian Dinkel, Gil Robalo Rei, Wolfgang A. Wall

PDF

Open Access

TL;DR

This paper introduces a novel adaptive learning rate decay method for Stochastic Variational Inference that reduces sensitivity to initial learning rate choices and improves convergence by monitoring parameter oscillations.

Contribution

The authors propose a new decay strategy based on variational parameter oscillations, enhancing existing adaptive methods with minimal additional memory and computation.

Findings

01

Reduces sensitivity to initial learning rate settings.

02

Improves convergence stability in variational inference.

03

Compatible with other adaptive learning rate algorithms.

Abstract

Like many optimization algorithms, Stochastic Variational Inference (SVI) is sensitive to the choice of the learning rate. If the learning rate is too small, the optimization process may be slow, and the algorithm might get stuck in local optima. On the other hand, if the learning rate is too large, the algorithm may oscillate or diverge, failing to converge to a solution. Adaptive learning rate methods such as Adam, AdaMax, Adagrad, or RMSprop automatically adjust the learning rate based on the history of gradients. Nevertheless, if the base learning rate is too large, the variational parameters might still oscillate around the optimal solution. With learning rate schedules, the learning rate can be reduced gradually to mitigate this problem. However, the amount at which the learning rate should be decreased in each iteration is not known a priori, which can significantly impact the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Neural Networks and Applications · Stochastic Gradient Optimization Techniques