ELFATT: Efficient Linear Fast Attention for Vision Transformers
Chong Wu, Maolin Che, Renjie Xu, Zhuoheng Ran, Hong Yan

TL;DR
ELFATT introduces a novel linear attention mechanism that significantly speeds up vision transformers with minimal performance loss, suitable for high-resolution tasks and resource-constrained environments.
Contribution
The paper proposes ELFATT, a new linear attention method that reduces memory and computational complexity while maintaining high performance, compatible with FlashAttention-2 and applicable to various tasks.
Findings
4-7x speedup over vanilla attention in high-res vision tasks
2-3x speedup with FlashAttention-2 acceleration
Effective in non-vision long-range tasks and on edge GPUs
Abstract
The attention mechanism is the key to the success of transformers in different machine learning tasks. However, the quadratic complexity with respect to the sequence length of the vanilla softmax-based attention mechanism becomes the major bottleneck for the application of long sequence tasks, such as vision tasks. Although various efficient linear attention mechanisms have been proposed, they need to sacrifice performance to achieve high efficiency. What's more, memory-efficient methods, such as FlashAttention-1-3, still have quadratic computation complexity which can be further improved. In this paper, we propose a novel efficient linear fast attention (ELFATT) mechanism to achieve low memory input/output operations, linear computational complexity, and high performance at the same time. ELFATT offers 4-7x speedups over the vanilla softmax-based attention mechanism in high-resolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Image Processing Techniques and Applications · Advanced Memory and Neural Computing
MethodsSoftmax · Attention Is All You Need · Diffusion
