Context Collapse: In-Context Learning and Model Collapse

Josef Ott

arXiv:2601.00923·cs.AI·January 6, 2026

Context Collapse: In-Context Learning and Model Collapse

Josef Ott

PDF

Open Access

TL;DR

This thesis explores in-context learning and model collapse in large language models, revealing phase transitions, the role of skew-symmetric components, and the impact of data growth on model stability.

Contribution

It provides theoretical analysis of ICL phase transitions and introduces the concept of context collapse, linking model dynamics with long-term stability issues.

Findings

01

Phase transition in in-context learning parameters at critical context length

02

Skew-symmetric components induce gradient rotation during ICL

03

Model collapse occurs unless data grows sufficiently fast or is retained

Abstract

This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Machine Learning and Algorithms