Dual Operating Modes of In-Context Learning

Ziqian Lin; Kangwook Lee

arXiv:2402.18819·cs.LG·August 5, 2024·1 cites

Dual Operating Modes of In-Context Learning

Ziqian Lin, Kangwook Lee

PDF

Open Access 2 Repos

TL;DR

This paper introduces a probabilistic model that explains both task learning and task retrieval modes of in-context learning, providing insights into phenomena like the initial risk increase with more in-context samples.

Contribution

It presents a unified probabilistic framework for analyzing the dual modes of ICL, extending existing models to include multiple task groups and input distributions, and explains observed practical phenomena.

Findings

01

Closed-form task posterior distribution derived

02

Explains the 'early ascent' risk phenomenon in ICL

03

Validates theoretical predictions with experiments on Transformers

Abstract

In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms