Language Model Memory and Memory Models for Language

Benjamin L. Badger

arXiv:2602.13466·cs.CL·May 20, 2026

Language Model Memory and Memory Models for Language

Benjamin L. Badger

PDF

TL;DR

This paper investigates how language models and autoencoders form and utilize memory, proposing new architectures and training methods to improve memory capabilities and computational efficiency.

Contribution

It introduces a parallelizable encoder-decoder memory model architecture and combined training objectives to enhance memory formation in language models.

Findings

01

Autoencoder embeddings nearly perfectly memorize input data.

02

Memory embeddings from language models contain little input information.

03

Combined training objectives improve memory richness and decoding ability.

Abstract

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically contain relatively little input information regardless of data and compute scale during training. In contrast, embeddings from autoencoders trained for input regeneration are capable of nearly perfect memory formation. The substitution of memory embeddings for token sequences leads to substantial computational efficiencies, motivating the introduction of a parallelizable encoder-decoder memory model architecture. Upon causal training these models contain information-poor embeddings incapable of arbitrary information access, but by combining causal and information retention objective functions they learn to form and decode information-rich memories.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare