Extended Mind Transformers

Phoebe Klett; Thomas Ahle

arXiv:2406.02332·cs.LG·June 5, 2024

Extended Mind Transformers

Phoebe Klett, Thomas Ahle

PDF

Open Access 1 Repo 8 Models 1 Datasets

TL;DR

Extended Mind Transformers enhance long-input processing in language models by utilizing a memory bank with improved retrieval and positional encoding, outperforming current methods by 6% on a new benchmark.

Contribution

The paper introduces improvements to Memorizing Transformers, enabling better long-range memory retrieval without fine-tuning, and provides a new benchmark for evaluation.

Findings

01

Outperform state-of-the-art by 6% on average

02

External memory retrieval is crucial in most decoder layers

03

Proper positional encoding updates are essential for memory access

Abstract

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should be updated for the keys and values retrieved. This intuitive method uses the model's own key/query system to select and attend to the most relevant memories at each generation step, rather than using external embeddings. We demonstrate the importance of external information being retrieved in a majority of decoder layers, contrary to previous work. We open source a new counterfactual long-range retrieval benchmark,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

normal-computing/extended-mind-transformers
pytorchOfficial

Models

Datasets

normalcomputing/wikiqa-counterfactual
dataset· 23 dl
23 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling