Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Joshua Barron; Devin White

arXiv:2506.09099·cs.LG·June 19, 2025

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Joshua Barron, Devin White

PDF

Open Access

TL;DR

This study explores how the capacity of pre-trained Transformer models influences their ability to memorize facts versus generalize to new data, revealing a fundamental trade-off that impacts model design.

Contribution

It provides a controlled analysis of how model size affects memorization and generalization, highlighting an inherent trade-off in pre-training large language models.

Findings

01

Small models generalize but do not memorize facts.

02

Large models memorize but fail to extrapolate.

03

No model succeeds at both memorization and extrapolation when trained jointly.

Abstract

The relationship between memorization and generalization in large language models (LLMs) remains an open area of research, with growing evidence that the two are deeply intertwined. In this work, we investigate this relationship by pre-training a series of capacity-limited Transformer models from scratch on two synthetic character-level tasks designed to separately probe generalization (via arithmetic extrapolation) and memorization (via factual recall). We observe a consistent trade-off: small models extrapolate to unseen arithmetic cases but fail to memorize facts, while larger models memorize but fail to extrapolate. An intermediate-capacity model exhibits a similar shift toward memorization. When trained on both tasks jointly, no model (regardless of size) succeeds at extrapolation. These findings suggest that pre-training may intrinsically favor one learning mode over the other. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms