Training Neural Networks with Optimal Double-Bayesian Learning
Vy Bui, Hang Yu, Karthik Kantipudi, Ziv Yaniv, and Stefan Jaeger

TL;DR
This paper introduces a novel double-Bayesian probabilistic framework to derive an optimal learning rate for neural network training, improving upon traditional empirical hyperparameter selection methods.
Contribution
It develops a theoretically grounded double-Bayesian decision mechanism for setting the learning rate in stochastic gradient descent, enhancing training effectiveness.
Findings
The theoretically derived learning rate improves training performance across tasks.
Experiments validate the practical significance of the double-Bayesian framework.
The approach offers a new perspective on hyperparameter optimization in neural networks.
Abstract
Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
