Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Chenxu Yu; Wenqi Fang

arXiv:2603.25024·stat.ML·March 27, 2026

Improving Infinitely Deep Bayesian Neural Networks with Nesterov's Accelerated Gradient Method

Chenxu Yu, Wenqi Fang

PDF

Open Access

TL;DR

This paper introduces a Nesterov-accelerated gradient method for Bayesian neural networks based on stochastic differential equations, significantly reducing computational cost and improving convergence and accuracy.

Contribution

The paper proposes integrating Nesterov acceleration into SDE-based Bayesian neural networks with an NFE-dependent residual connection, enhancing efficiency and performance.

Findings

01

Reduces the number of function evaluations during training and testing.

02

Achieves higher predictive accuracy compared to traditional SDE-BNNs.

03

Demonstrates effectiveness on image classification and sequence modeling tasks.

Abstract

As a representative continuous-depth neural network approach, stochastic differential equation (SDE)-based Bayesian neural networks (BNNs) have attracted considerable attention due to their solid theoretical foundations and strong potential for real-world applications. However, their reliance on numerical SDE solvers inevitably incurs a large number of function evaluations (NFEs), resulting in high computational cost and occasional convergence instability. To address these challenges, we propose a Nesterov-accelerated gradient (NAG) enhanced SDE-BNN model. By integrating NAG into the SDE-BNN framework along with an NFE-dependent residual skip connection, our method accelerates convergence and substantially reduces NFEs during both training and testing. Extensive empirical results show that our model consistently outperforms conventional SDE-BNNs across various tasks, including image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning