Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch, Uri Alon, Graham Neubig, Matthew R. Gormley

TL;DR
Unlimiformer enables existing transformer models to process unlimited-length inputs efficiently by using a kNN-based attention approximation, significantly extending their capabilities for long-document tasks without retraining.
Contribution
It introduces a universal method to extend pretrained transformers to handle unlimited input lengths using a kNN index for attention, without additional training or code modifications.
Findings
Successfully processes inputs up to 500k tokens.
Improves performance of BART and Longformer on long-document tasks.
Operates efficiently with sub-linear query time.
Abstract
Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k-nearest-neighbor (kNN) index, while the returned kNN distances are the attention dot-product scores. This kNN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top-k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even 500k token-long inputs from the BookSum dataset, without any input truncation at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗abertsch/unlimiformer-bart-booksum-retrievalmodel· 4 dl· ♡ 14 dl♡ 1
- 🤗abertsch/unlimiformer-bart-booksum-alternatingmodel· 4 dl· ♡ 34 dl♡ 3
- 🤗abertsch/bart-base-booksummodel· 4 dl4 dl
- 🤗abertsch/unlimiformer-earlyk-bart-booksummodel· 4 dl4 dl
- 🤗abertsch/unlimiformer-bart-booksum-random-encodingmodel· 4 dl4 dl
- 🤗abertsch/bart-base-govreportmodel· 10 dl· ♡ 210 dl♡ 2
- 🤗abertsch/unlimiformer-bart-govreport-earlykmodel· 5 dl5 dl
- 🤗abertsch/unlimiformer-bart-govreport-alternatingmodel· 10 dl· ♡ 210 dl♡ 2
- 🤗abertsch/bart-base-summscreenmodel· 5 dl· ♡ 15 dl♡ 1
- 🤗abertsch/unlimiformer-bart-summscreen-earlykmodel· 6 dl· ♡ 16 dl♡ 1
Videos
Taxonomy
TopicsHandwritten Text Recognition Techniques · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
MethodsHow do I get a human at Expedia immediately? (2025-2026) · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Adam · Weight Decay · AdamW · Residual Connection · Dense Connections
