The Bayesian Geometry of Transformer Attention
Naman Agarwal, Siddhartha R. Dalal, Vishal Misra

TL;DR
This paper demonstrates that transformer attention mechanisms perform Bayesian inference through a geometric process, verified in controlled environments with known posteriors, distinguishing them from other architectures.
Contribution
It introduces Bayesian wind tunnels for rigorous testing and reveals the geometric mechanism by which transformers implement Bayesian reasoning.
Findings
Transformers reproduce Bayesian posteriors with high accuracy in controlled settings.
Attention mechanisms serve as content-addressable routing for Bayesian updates.
Training reveals a low-dimensional posterior entropy manifold aligned with attention patterns.
Abstract
Transformers often appear to perform Bayesian reasoning in context, but verifying this rigorously has been impossible: natural data lack analytic posteriors, and large models conflate reasoning with memorization. We address this by constructing \emph{Bayesian wind tunnels} -- controlled environments where the true posterior is known in closed form and memorization is provably impossible. In these settings, small transformers reproduce Bayesian posteriors with - bit accuracy, while capacity-matched MLPs fail by orders of magnitude, establishing a clear architectural separation. Across two tasks -- bijection elimination and Hidden Markov Model (HMM) state tracking -- we find that transformers implement Bayesian inference through a consistent geometric mechanism: residual streams serve as the belief substrate, feed-forward networks perform the posterior update, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
