SwiftFormer: Efficient Additive Attention for Transformer-based   Real-time Mobile Vision Applications

Abdelrahman Shaker; Muhammad Maaz; Hanoona Rasheed; Salman Khan,; Ming-Hsuan Yang; Fahad Shahbaz Khan

arXiv:2303.15446·cs.CV·July 27, 2023·5 cites

SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications

Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan,, Ming-Hsuan Yang, Fahad Shahbaz Khan

PDF

Open Access 5 Repos 7 Models

TL;DR

SwiftFormer introduces an efficient additive attention mechanism that replaces quadratic matrix operations with linear element-wise multiplications, enabling high-accuracy, real-time vision models suitable for mobile devices.

Contribution

The paper proposes a novel additive attention mechanism that reduces computational complexity and can be used throughout the network, improving speed and accuracy for mobile vision applications.

Findings

01

Achieves 78.5% top-1 ImageNet accuracy on small model

02

Runs at 0.8 ms latency on iPhone 14

03

Outperforms MobileViT-v2 in speed and accuracy

Abstract

Self-attention has become a defacto choice for capturing global context in various vision applications. However, its quadratic computational complexity with respect to image resolution limits its use in real-time applications, especially for deployment on resource-constrained mobile devices. Although hybrid approaches have been proposed to combine the advantages of convolutions and self-attention for a better speed-accuracy trade-off, the expensive matrix multiplication operations in self-attention remain a bottleneck. In this work, we introduce a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations with linear element-wise multiplications. Our design shows that the key-value interaction can be replaced with a linear layer without sacrificing any accuracy. Unlike previous state-of-the-art methods, our efficient formulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Brain Tumor Detection and Classification

MethodsTanh Activation · Linear Layer