Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Zhonghao He, Tianyi Qiu, Hirokazu Shirado, Maarten Sap

TL;DR
This paper introduces an unsupervised Martingale Score to evaluate Bayesian rationality in LLM reasoning, revealing widespread belief entrenchment and correlating with accuracy across various open-ended tasks.
Contribution
The study proposes a novel, unsupervised metric based on Bayesian martingale properties to assess belief updating in LLMs, highlighting prevalent belief entrenchment.
Findings
Belief entrenchment is widespread across models and domains.
The Martingale Score correlates with ground-truth accuracy.
Models vary in susceptibility to belief entrenchment.
Abstract
Recent advances in reasoning techniques have substantially improved the performance of large language models (LLMs), raising expectations for their ability to provide accurate, truthful, and reliable information. However, emerging evidence suggests that iterative reasoning may foster belief entrenchment and confirmation bias, rather than enhancing truth-seeking behavior. In this study, we propose a systematic evaluation framework for belief entrenchment in LLM reasoning by leveraging the Martingale property from Bayesian statistics. This property implies that, under rational belief updating, the expected value of future beliefs should remain equal to the current belief, i.e., belief updates are unpredictable from the current belief. We propose the unsupervised, regression-based Martingale Score to measure violations of this property, which signal deviation from the Bayesian ability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Forecasting Techniques and Applications · Explainable Artificial Intelligence (XAI)
