Extended Mind Transformers
Phoebe Klett, Thomas Ahle

TL;DR
Extended Mind Transformers enhance long-input processing in language models by utilizing a memory bank with improved retrieval and positional encoding, outperforming current methods by 6% on a new benchmark.
Contribution
The paper introduces improvements to Memorizing Transformers, enabling better long-range memory retrieval without fine-tuning, and provides a new benchmark for evaluation.
Findings
Outperform state-of-the-art by 6% on average
External memory retrieval is crucial in most decoder layers
Proper positional encoding updates are essential for memory access
Abstract
Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should be updated for the keys and values retrieved. This intuitive method uses the model's own key/query system to select and attend to the most relevant memories at each generation step, rather than using external embeddings. We demonstrate the importance of external information being retrieved in a majority of decoder layers, contrary to previous work. We open source a new counterfactual long-range retrieval benchmark,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗normalcomputing/extended-mind-mpt-7bmodel· 18 dl· ♡ 2918 dl♡ 29
- 🤗normalcomputing/extended-mind-mpt-7b-chatmodel· 22 dl· ♡ 1122 dl♡ 11
- 🤗normalcomputing/extended-mind-llama-2-7bmodel· 16 dl· ♡ 516 dl♡ 5
- 🤗normalcomputing/extended-mind-llama-2-7b-chatmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗normalcomputing/extended-mind-llama-2-70b-chatmodel· 4 dl4 dl
- 🤗normalcomputing/extended-mind-llama-2-70bmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗normalcomputing/extended-mind-mpt-30bmodel· 9 dl9 dl
- 🤗normalcomputing/extended-mind-mpt-30b-chatmodel· 25 dl· ♡ 225 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Topic Modeling
