Efficient Transformers: A Survey

Yi Tay; Mostafa Dehghani; Dara Bahri; Donald Metzler

arXiv:2009.06732·cs.LG·March 15, 2022·222 cites

Efficient Transformers: A Survey

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

PDF

Open Access

TL;DR

This survey reviews recent efficient Transformer architectures designed to reduce computational and memory costs, providing an organized overview of models like Reformer, Linformer, Performer, and Longformer across various domains.

Contribution

It offers a comprehensive and organized overview of recent efficiency-focused Transformer models, aiding researchers in understanding current advancements and trends.

Findings

01

Survey covers a wide range of efficiency improvements

02

Highlights key models and their domain applications

03

Provides insights into future research directions

Abstract

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Transformer architecture, many of which make improvements around computational and memory efficiency. With the aim of helping the avid researcher navigate this flurry, this paper characterizes a large and thoughtful selection of recent efficiency-flavored "X-former" models, providing an organized and comprehensive overview of existing work and models across multiple domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Advanced Neural Network Applications · Speech Recognition and Synthesis

MethodsAbsolute Position Encodings · Position-Wise Feed-Forward Layer · Linear Layer · Multi-Head Linear Attention · How do I get a human at Expedia immediately? (2025-2026) · Convolution · Layer Normalization · 1x1 Convolution · How do I make a claim with Expedia?*Make FastClaimService · Weight Decay