Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method
Chenxu Yu, Wenqi Fang

TL;DR
This paper introduces a Nesterov-accelerated gradient method for Bayesian neural networks based on stochastic differential equations, significantly reducing computational cost and improving convergence and accuracy.
Contribution
The paper proposes integrating Nesterov acceleration into SDE-based Bayesian neural networks with an NFE-dependent residual connection, enhancing efficiency and performance.
Findings
Reduces the number of function evaluations during training and testing.
Achieves higher predictive accuracy compared to traditional SDE-BNNs.
Demonstrates effectiveness on image classification and sequence modeling tasks.
Abstract
As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
