Are LLM Belief Updates Consistent with Bayes' Theorem?

Sohaib Imran; Ihor Kendiukhov; Matthew Broerman; Aditya Thomas; Riccardo Campanella; Rob Lamb; Peter M. Atkinson

arXiv:2507.17951·cs.CL·July 25, 2025

Are LLM Belief Updates Consistent with Bayes' Theorem?

Sohaib Imran, Ihor Kendiukhov, Matthew Broerman, Aditya Thomas, Riccardo Campanella, Rob Lamb, Peter M. Atkinson

PDF

Open Access

TL;DR

This paper introduces the Bayesian Coherence Coefficient (BCC) to evaluate how well large language models update their beliefs in accordance with Bayes' theorem, finding that larger models tend to be more coherent.

Contribution

The paper proposes a novel BCC metric and provides empirical evidence that larger, more capable language models better adhere to Bayesian belief updating.

Findings

01

Larger models show higher BCC scores indicating more Bayesian coherence

02

Model size correlates positively with belief update consistency

03

Results impact understanding and governance of LLMs

Abstract

Do larger and more capable language models learn to update their "beliefs" about propositions more consistently with Bayes' theorem when presented with evidence in-context? To test this, we formulate a Bayesian Coherence Coefficient (BCC) metric and generate a dataset with which to measure the BCC. We measure BCC for multiple pre-trained-only language models across five model families, comparing against the number of model parameters, the amount of training data, and model scores on common benchmarks. Our results provide evidence for our hypothesis that larger and more capable pre-trained language models assign credences that are more coherent with Bayes' theorem. These results have important implications for our understanding and governance of LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI)