Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

Katerina Papagiannouli; Dario Trevisan; Giuseppe Pio Zitto

arXiv:2602.22925·stat.ML·February 27, 2026

Beyond NNGP: Large Deviations and Feature Learning in Bayesian Neural Networks

Katerina Papagiannouli, Dario Trevisan, Giuseppe Pio Zitto

PDF

Open Access

TL;DR

This paper explores the behavior of wide Bayesian neural networks beyond Gaussian-process limits, using large deviation theory to understand fluctuations, feature learning, and posterior concentration at the functional level.

Contribution

It introduces a novel variational framework based on rate functions to analyze posterior behavior, incorporating internal kernel optimization unlike traditional fixed-kernel approaches.

Findings

01

Accurately describes finite-width network behavior

02

Captures non-Gaussian tail effects

03

Reveals data-dependent kernel selection

Abstract

We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Stochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis