Efficient Transformers: A Survey
Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

TL;DR
This survey reviews recent efficient Transformer architectures designed to reduce computational and memory costs, providing an organized overview of models like Reformer, Linformer, Performer, and Longformer across various domains.
Contribution
It offers a comprehensive and organized overview of recent efficiency-focused Transformer models, aiding researchers in understanding current advancements and trends.
Findings
Survey covers a wide range of efficiency improvements
Highlights key models and their domain applications
Provides insights into future research directions
Abstract
Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Advanced Neural Network Applications · Speech Recognition and Synthesis
MethodsAbsolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Multi-Head Linear Attention · How do I get a human at Expedia immediately? (2025-2026) · Convolution · Layer Normalization · 1x1 Convolution · How do I make a claim with Expedia?*Make FastClaimService · Weight Decay
