Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning
Jingcheng Niu, Subhabrata Dutta, Ahmed Elshabrawy, Harish Tayyar Madabushi, Iryna Gurevych

TL;DR
This paper investigates the mechanisms behind in-context learning in large language models, revealing it is neither just memorization nor a symbolic algorithm, but a complex interplay of factors clarified through systematic analysis.
Contribution
The study introduces a new suite of investigative tasks and a novel method to analyze in-context learning, providing deeper mechanistic insights beyond prior assumptions.
Findings
ICL extends beyond simple memorization of training data.
ICL does not implement an independent symbolic algorithm.
Training dynamics and model capabilities influence ICL performance.
Abstract
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks after seeing just a few examples. The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood. Some studies argue that it is merely the result of memorizing vast amounts of data, while others contend that it reflects a fundamental, symbolic algorithmic development in LMs. In this work, we introduce a suite of investigative tasks and a novel method to systematically investigate ICL by leveraging the full Pythia scaling suite, including interim checkpoints that capture progressively larger amount of training data. By carefully exploring ICL performance on downstream tasks and simultaneously conducting a mechanistic analysis of the residual stream's subspace, we demonstrate that ICL extends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Teaching and Learning Methods · Education and Critical Thinking Development
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Layer Normalization · Pythia · Byte Pair Encoding · Label Smoothing · Adam · Softmax
