Capacity Matters: a Proof-of-Concept for Transformer Memorization on Real-World Data
Anton Changalidis, Aki H\"arm\"a

TL;DR
This study investigates how transformer architecture and data complexity affect memorization capacity, highlighting the importance of embedding size, activation functions, and data structure in optimizing model performance on real-world data.
Contribution
It provides empirical insights into how model size, activation functions, and data complexity influence memorization in transformers, guiding better model design.
Findings
Embedding size is the key factor for learning speed and capacity.
Softmax activation offers greater stability and capacity.
More complex data enhances final memorization.
Abstract
This paper studies how the model architecture and data configurations influence the empirical memorization capacity of generative transformers. The models are trained using synthetic text datasets derived from the Systematized Nomenclature of Medicine (SNOMED) knowledge graph: triplets, representing static connections, and sequences, simulating complex relation patterns. The results show that embedding size is the primary determinant of learning speed and capacity, while additional layers provide limited benefits and may hinder performance on simpler datasets. Activation functions play a crucial role, and Softmax demonstrates greater stability and capacity. Furthermore, increasing the complexity of the data set seems to improve the final memorization. These insights improve our understanding of transformer memory mechanisms and provide a framework for optimizing model design with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPower Systems and Technologies · Computational Physics and Python Applications · Advanced Computational Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Softmax · Sparse Evolutionary Training
