Memorizing Transformers
Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian, Szegedy

TL;DR
This paper proposes a method for language models to memorize and retrieve new information at inference time using a non-differentiable memory, enabling immediate knowledge acquisition without retraining.
Contribution
It introduces a kNN-based memory extension for language models, allowing them to memorize and access new data instantly during inference.
Findings
Memory size up to 262K tokens improves performance.
Models can utilize newly defined functions and theorems during testing.
Approach enhances language modeling across diverse datasets.
Abstract
Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
