What Do Language Models Learn in Context? The Structured Task Hypothesis

Jiaoda Li; Yifan Hou; Mrinmaya Sachan; Ryan Cotterell

arXiv:2406.04216·cs.CL·August 6, 2024·1 cites

What Do Language Models Learn in Context? The Structured Task Hypothesis

Jiaoda Li, Yifan Hou, Mrinmaya Sachan, Ryan Cotterell

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models learn from in-context examples, testing three hypotheses, and finds evidence supporting the idea that they compose pre-trained tasks to learn new ones.

Contribution

The study empirically evaluates three hypotheses about in-context learning and provides evidence that LLMs learn new tasks by composing pre-trained tasks, invalidating two alternative explanations.

Findings

01

Counterexamples invalidate task selection and meta-learning hypotheses.

02

Evidence supports task composition hypothesis.

03

LLMs can learn new tasks by combining pre-trained knowledge.

Abstract

Large language models (LLMs) exhibit an intriguing ability to learn a novel task from in-context examples presented in a demonstration, termed in-context learning (ICL). Understandably, a swath of research has been dedicated to uncovering the theories underpinning ICL. One popular hypothesis explains ICL by task selection. LLMs identify the task based on the demonstration and generalize it to the prompt. Another popular hypothesis is that ICL is a form of meta-learning, i.e., the models learn a learning algorithm at pre-training time and apply it to the demonstration. Finally, a third hypothesis argues that LLMs use the demonstration to select a composition of tasks learned during pre-training to perform ICL. In this paper, we empirically explore these three hypotheses that explain LLMs' ability to learn in context with a suite of experiments derived from common text classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eth-lre/llm_icl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling