Training Neural Networks with Optimal Double-Bayesian Learning

Vy Bui; Hang Yu; Karthik Kantipudi; Ziv Yaniv; and Stefan Jaeger

arXiv:2605.20009·cs.LG·May 20, 2026

Training Neural Networks with Optimal Double-Bayesian Learning

Vy Bui, Hang Yu, Karthik Kantipudi, Ziv Yaniv, and Stefan Jaeger

PDF

TL;DR

This paper introduces a novel double-Bayesian probabilistic framework to derive an optimal learning rate for neural network training, improving upon traditional empirical hyperparameter selection methods.

Contribution

It develops a theoretically grounded double-Bayesian decision mechanism for setting the learning rate in stochastic gradient descent, enhancing training effectiveness.

Findings

01

The theoretically derived learning rate improves training performance across tasks.

02

Experiments validate the practical significance of the double-Bayesian framework.

03

The approach offers a new perspective on hyperparameter optimization in neural networks.

Abstract

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.