Transformers are Bayesian Networks

Gregory Coppola

arXiv:2603.17063·cs.AI·March 19, 2026

Transformers are Bayesian Networks

Gregory Coppola

PDF

Open Access

TL;DR

This paper demonstrates that transformers function as Bayesian networks, implementing belief propagation and exact inference through their architecture, providing a formal understanding of their success.

Contribution

It formally proves that sigmoid transformers implement belief propagation, establishing a Bayesian network perspective and analyzing their structure and inference capabilities.

Findings

01

Transformers perform weighted loopy belief propagation.

02

They can implement exact belief propagation on knowledge bases.

03

The architecture's AND/OR structure aligns with Pearl's algorithm.

Abstract

Transformers are the dominant architecture in AI, yet why they work remains poorly understood. This paper offers a precise answer: a transformer is a Bayesian network. We establish this in five ways. First, we prove that every sigmoid transformer with any weights implements weighted loopy belief propagation on its implicit factor graph. One layer is one round of BP. This holds for any weights -- trained, random, or constructed. Formally verified against standard mathematical axioms. Second, we give a constructive proof that a transformer can implement exact belief propagation on any declared knowledge base. On knowledge bases without circular dependencies this yields provably correct probability estimates at every node. Formally verified against standard mathematical axioms. Third, we prove uniqueness: a sigmoid transformer that produces exact posteriors necessarily has BP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Logic, Reasoning, and Knowledge