What In-Context Learning "Learns" In-Context: Disentangling Task   Recognition and Task Learning

Jane Pan; Tianyu Gao; Howard Chen; Danqi Chen

arXiv:2305.09731·cs.CL·May 18, 2023·1 cites

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Jane Pan, Tianyu Gao, Howard Chen, Danqi Chen

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper investigates how large language models perform in-context learning by disentangling their abilities to recognize tasks from demonstrations and to learn new input-label mappings, revealing distinct roles and scaling behaviors.

Contribution

It introduces a framework to distinguish task recognition from task learning in ICL, demonstrating their separate effects and how they scale with model size and demonstrations.

Findings

01

Models can recognize tasks without ground-truth labels.

02

Task recognition does not improve with larger models or more demonstrations.

03

Task learning improves with model scale and more demonstrations.

Abstract

Large language models (LLMs) exploit in-context learning (ICL) to solve tasks with only a few demonstrations, but its mechanisms are not yet well-understood. Some works suggest that LLMs only recall already learned concepts from pre-training, while others hint that ICL performs implicit learning over demonstrations. We characterize two ways through which ICL leverages demonstrations. Task recognition (TR) captures the extent to which LLMs can recognize a task through demonstrations -- even without ground-truth labels -- and apply their pre-trained priors, whereas task learning (TL) is the ability to capture new input-label mappings unseen in pre-training. Using a wide range of classification datasets and three LLM families (GPT-3, LLaMA and OPT), we design controlled experiments to disentangle the roles of TR and TL in ICL. We show that (1) models can achieve non-trivial performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

princeton-nlp/whaticllearns
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsHierarchical Information Threading