Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks
Katerina Papagiannouli, Dario Trevisan, Giuseppe Pio Zitto

TL;DR
This paper explores the behavior of wide Bayesian neural networks beyond Gaussian-process limits, using large deviation theory to understand fluctuations, feature learning, and posterior concentration at the functional level.
Contribution
It introduces a novel variational framework based on rate functions to analyze posterior behavior, incorporating internal kernel optimization unlike traditional fixed-kernel approaches.
Findings
Accurately describes finite-width network behavior
Captures non-Gaussian tail effects
Reveals data-dependent kernel selection
Abstract
We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis
