Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks
Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

TL;DR
This paper investigates how large language models, specifically GPT-style transformers, develop in-context learning and skill composition abilities on modular arithmetic tasks, revealing the emergence of structured representations and algorithmic shifts with training.
Contribution
It demonstrates the transition from in-distribution to out-of-distribution generalization in transformers and identifies the minimal model depth needed for out-of-distribution skills.
Findings
Transformers show a transition to out-of-distribution generalization with more pre-training tasks.
Two transformer blocks are sufficient for out-of-distribution generalization.
Deeper models exhibit transient out-of-distribution capabilities requiring early stopping.
Abstract
Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions labeled by the vector . We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive and developmental aspects of mathematical skills
