An Explanation of In-context Learning as Implicit Bayesian Inference

Sang Michael Xie; Aditi Raghunathan; Percy Liang; Tengyu Ma

arXiv:2111.02080·cs.CL·July 22, 2022·20 cites

An Explanation of In-context Learning as Implicit Bayesian Inference

Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper explains in-context learning in large language models as an implicit form of Bayesian inference, linking it to the inference of latent document-level concepts during pretraining and testing.

Contribution

It provides a theoretical framework connecting in-context learning to Bayesian inference of latent variables, supported by experiments on synthetic and real datasets.

Findings

01

In-context learning emerges from inference of shared latent concepts.

02

Model scaling improves in-context performance.

03

In-context learning is sensitive to example order.

Abstract

Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

p-lambda/incontext-learning
jaxOfficial

Videos

An Explanation of In-context Learning as Implicit Bayesian Inference· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Label Smoothing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Residual Connection · Dense Connections · Multi-Head Attention · Layer Normalization