What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
Jane Pan, Tianyu Gao, Howard Chen, Danqi Chen

TL;DR
This paper investigates how large language models perform in-context learning by disentangling their abilities to recognize tasks from demonstrations and to learn new input-label mappings, revealing distinct roles and scaling behaviors.
Contribution
It introduces a framework to distinguish task recognition from task learning in ICL, demonstrating their separate effects and how they scale with model size and demonstrations.
Findings
Models can recognize tasks without ground-truth labels.
Task recognition does not improve with larger models or more demonstrations.
Task learning improves with model scale and more demonstrations.
Abstract
Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood. Some works suggest that LLMs only recall already learned concepts from pre-training, while others hint that ICL performs implicit learning over demonstrations. We characterize two ways through which ICL leverages demonstrations. Task recognition (TR) captures the extent to which LLMs can recognize a task through demonstrations -- even without ground-truth labels -- and apply their pre-trained priors, whereas task learning (TL) is the ability to capture new input-label mappings unseen in pre-training. Using a wide range of classification datasets and three LLM families (GPT-3, LLaMA and OPT), we design controlled experiments to disentangle the roles of TR and TL in ICL. We show that (1) models can achieve non-trivial performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsHierarchical Information Threading
