Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Sushant Mehta; Ishan Gupta

arXiv:2511.06232·cs.LG·November 11, 2025

Scaling Laws and In-Context Learning: A Unified Theoretical Framework

Sushant Mehta, Ishan Gupta

PDF

Open Access

TL;DR

This paper presents a unified theoretical framework linking scaling laws to in-context learning in transformers, revealing how model size and structure influence learning capabilities and phase transitions.

Contribution

It introduces a comprehensive theory connecting scaling laws to ICL emergence, including phase transitions and optimal model size allocations, validated by systematic experiments.

Findings

01

ICL performance follows power-law scaling with model parameters.

02

Transformers implement gradient-based meta-learning in their forward pass.

03

Sharp phase transitions occur at critical model scales.

Abstract

In-context learning (ICL) enables large language models to adapt to new tasks from demonstrations without parameter updates. Despite extensive empirical studies, a principled understanding of ICL emergence at scale remains more elusive. We present a unified theoretical framework connecting scaling laws to ICL emergence in transformers. Our analysis establishes that ICL performance follows power-law relationships with model depth $L$ , width $d$ , context length $k$ , and training data $D$ , with exponents determined by task structure. We show that under specific conditions, transformers implement gradient-based metalearning in their forward pass, with an effective learning rate $η_{eff} = Θ (1/ L d)$ . We demonstrate sharp phase transitions at critical scales and derive optimal depth-width allocations favoring $L^{*} \propto N^{2/3}$ , $d^{*} \propto N^{1/3}$ for the fixed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications