Context Collapse: In-Context Learning and Model Collapse
Josef Ott

TL;DR
This thesis explores in-context learning and model collapse in large language models, revealing phase transitions, the role of skew-symmetric components, and the impact of data growth on model stability.
Contribution
It provides theoretical analysis of ICL phase transitions and introduces the concept of context collapse, linking model dynamics with long-term stability issues.
Findings
Phase transition in in-context learning parameters at critical context length
Skew-symmetric components induce gradient rotation during ICL
Model collapse occurs unless data grows sufficiently fast or is retained
Abstract
This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on linear regression tasks, and show that minimising the in-context loss leads to a phase transition in the learned parameters. Above a critical context length, the solution develops a skew-symmetric component. We prove this by reducing the forward pass of the linear transformer under weight tying to preconditioned gradient descent, and then analysing the optimal preconditioner. This preconditioner includes a skew-symmetric component, which induces a rotation of the gradient direction. For model collapse, we use martingale and random walk theory to analyse simplified settings - linear regression and Gaussian fitting - under both replacing and cumulative data regimes. We strengthen existing results by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Machine Learning and Algorithms
