An Explanation of In-context Learning as Implicit Bayesian Inference
Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

TL;DR
This paper explains in-context learning in large language models as an implicit form of Bayesian inference, linking it to the inference of latent document-level concepts during pretraining and testing.
Contribution
It provides a theoretical framework connecting in-context learning to Bayesian inference of latent variables, supported by experiments on synthetic and real datasets.
Findings
In-context learning emerges from inference of shared latent concepts.
Model scaling improves in-context performance.
In-context learning is sensitive to example order.
Abstract
Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Residual Connection · Dense Connections · Multi-Head Attention · Layer Normalization
