What Do Language Models Learn in Context? The Structured Task Hypothesis
Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell

TL;DR
This paper investigates how large language models learn from in-context examples, testing three hypotheses, and finds evidence supporting the idea that they compose pre-trained tasks to learn new ones.
Contribution
The study empirically evaluates three hypotheses about in-context learning and provides evidence that LLMs learn new tasks by composing pre-trained tasks, invalidating two alternative explanations.
Findings
Counterexamples invalidate task selection and meta-learning hypotheses.
Evidence supports task composition hypothesis.
LLMs can learn new tasks by combining pre-trained knowledge.
Abstract
Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs' ability to learn in context with a suite of experiments derived from common text classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
