Dual Operating Modes of In-Context Learning
Ziqian Lin, Kangwook Lee

TL;DR
This paper introduces a probabilistic model that explains both task learning and task retrieval modes of in-context learning, providing insights into phenomena like the initial risk increase with more in-context samples.
Contribution
It presents a unified probabilistic framework for analyzing the dual modes of ICL, extending existing models to include multiple task groups and input distributions, and explains observed practical phenomena.
Findings
Closed-form task posterior distribution derived
Explains the 'early ascent' risk phenomenon in ICL
Validates theoretical predictions with experiments on Transformers
Abstract
In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms
