Is In-Context Learning in Large Language Models Bayesian? A Martingale Perspective
Fabian Falck, Ziyu Wang, Chris Holmes

TL;DR
This paper investigates whether in-context learning in large language models behaves like Bayesian inference by analyzing the martingale property, providing theoretical checks and experimental evidence that challenge this hypothesis.
Contribution
It introduces a martingale-based framework to evaluate the Bayesian nature of ICL and presents empirical tests that reveal deviations from Bayesian behavior in LLMs.
Findings
Violations of the martingale property in experiments
Uncertainty does not decrease as expected with more data
ICL does not fully exhibit Bayesian scaling behavior
Abstract
In-context learning (ICL) has emerged as a particularly remarkable characteristic of Large Language Models (LLM): given a pretrained LLM and an observed dataset, LLMs can make predictions for new data points from the same distribution without fine-tuning. Numerous works have postulated ICL as approximately Bayesian inference, rendering this a natural hypothesis. In this work, we analyse this hypothesis from a new angle through the martingale property, a fundamental requirement of a Bayesian learning system for exchangeable data. We show that the martingale property is a necessary condition for unambiguous predictions in such scenarios, and enables a principled, decomposed notion of uncertainty vital in trustworthy, safety-critical systems. We derive actionable checks with corresponding theory and test statistics which must hold if the martingale property is satisfied. We also examine if…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
