Transformer verbatim in-context retrieval across time and scale
Kristijan Armeni, Marko Pranji\'c, Senja Pollak

TL;DR
This study investigates how language models develop the ability to retrieve specific in-context nouns during training, revealing a sudden transition early in training that correlates with improved zero-shot performance and semantic retrieval patterns.
Contribution
The paper uncovers the timing and scale effects of in-context retrieval development in language models and links it to zero-shot benchmark learning and semantic noun properties.
Findings
Verbatim in-context retrieval develops suddenly after about 1% of training tokens.
Development of retrieval ability correlates with zero-shot benchmark performance.
Concrete nouns are retrieved more effectively than abstract nouns early in training.
Abstract
To predict upcoming text, language models must in some cases retrieve in-context information verbatim. In this report, we investigated how the ability of language models to retrieve arbitrary in-context nouns developed during training (across time) and as language models trained on the same dataset increase in size (across scale). We then asked whether learning of in-context retrieval correlates with learning of more challenging zero-shot benchmarks. Furthermore, inspired by semantic effects in human short-term memory, we evaluated the retrieval with respect to a major semantic component of target nouns, namely whether they denote a concrete or abstract entity, as rated by humans. We show that verbatim in-context retrieval developed in a sudden transition early in the training process, after about 1% of the training tokens. This was observed across model sizes (from 14M and up to 12B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
