Memorizing Transformers

Yuhuai Wu; Markus N. Rabe; DeLesley Hutchins; Christian; Szegedy

arXiv:2203.08913·cs.LG·March 18, 2022·39 cites

Memorizing Transformers

Yuhuai Wu, Markus N. Rabe, DeLesley Hutchins, Christian, Szegedy

PDF

Open Access 4 Repos 1 Models 1 Video

TL;DR

This paper proposes a method for language models to memorize and retrieve new information at inference time using a non-differentiable memory, enabling immediate knowledge acquisition without retraining.

Contribution

It introduces a kNN-based memory extension for language models, allowing them to memorize and access new data instantly during inference.

Findings

01

Memory size up to 262K tokens improves performance.

02

Models can utilize newly defined functions and theorems during testing.

03

Approach enhances language modeling across diverse datasets.

Abstract

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key, value) pairs improves language modeling across various benchmarks and tasks, including generic webtext (C4), math papers (arXiv), books (PG-19), code (Github), as well as formal theorems (Isabelle). We show that the performance steadily improves when we increase the size of memory up to 262K tokens. On benchmarks including code and mathematics, we find that the model is capable of making use of newly defined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
abhinavv3/GPT_with_Modified_Memorizing_Transformer
model

Videos

Memorizing Transformers· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification