Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data

Anton Changalidis; Aki H\"arm\"a

arXiv:2506.14704·cs.CL·June 18, 2025

Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data

Anton Changalidis, Aki H\"arm\"a

PDF

Open Access 1 Repo 1 Video

TL;DR

This study investigates how transformer architecture and data complexity affect memorization capacity, highlighting the importance of embedding size, activation functions, and data structure in optimizing model performance on real-world data.

Contribution

It provides empirical insights into how model size, activation functions, and data complexity influence memorization in transformers, guiding better model design.

Findings

01

Embedding size is the key factor for learning speed and capacity.

02

Softmax activation offers greater stability and capacity.

03

More complex data enhances final memorization.

Abstract

This paper studies how the model architecture and data configurations influence the empirical memorization capacity of generative transformers. The models are trained using synthetic text datasets derived from the Systematized Nomenclature of Medicine (SNOMED) knowledge graph: triplets, representing static connections, and sequences, simulating complex relation patterns. The results show that embedding size is the primary determinant of learning speed and capacity, while additional layers provide limited benefits and may hinder performance on simpler datasets. Activation functions play a crucial role, and Softmax demonstrates greater stability and capacity. Furthermore, increasing the complexity of the data set seems to improve the final memorization. These insights improve our understanding of transformer memory mechanisms and provide a framework for optimizing model design with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

um-dacs-nlp/capacity
pytorchOfficial

Videos

Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data· underline

Taxonomy

TopicsPower Systems and Technologies · Computational Physics and Python Applications · Advanced Computational Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Sparse Evolutionary Training