LLMs are Bayesian, In Expectation, Not in Realization
Leon Chlon, Zein Khamis, Maggie Chlon, Mahdi El Zein, MarcAntonio M. Awada

TL;DR
This paper demonstrates that the perceived violations of Bayesian behavior in transformer models are due to positional encodings breaking exchangeability, and shows that in expectation, models align with Bayesian principles, with empirical evidence supporting this.
Contribution
It clarifies that exchangeability violations are caused by positional encodings and provides bounds and empirical evidence that models behave Bayesian in expectation rather than in realization.
Findings
Permutation-induced dispersion is bounded under certain assumptions.
Expected MDL/compression over permutations is near-optimal.
Order-induced variance decreases with context length and positional encoding modifications.
Abstract
Exchangeability-based martingale diagnostics have been used to question Bayesian explanations of transformer in-context learning. We show that these violations are compatible with Bayesian/MDL behavior once we account for a basic architectural fact: positional encodings break exchangeability. Accordingly, the relevant baseline is performance in expectation over orderings of an exchangeable multiset, not performance under every fixed ordering. In a Bernoulli microscope (under explicit regularity assumptions), we bound the permutation-induced dispersion detected by martingale diagnostics (Theorem~3.4) while proving near-optimal expected MDL/compression over permutations (Theorem~3.6). Empirically, black-box next-token log-probabilities from an Azure OpenAI deployment exhibit nonzero expectation--realization gaps that decay with context length (mean 0.74 at to 0.26 at ;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods
