Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks
Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Shi-Min Hu

TL;DR
This paper introduces external attention, a novel mechanism using shared memories and simple linear layers, offering a more efficient alternative to self-attention for various visual tasks with comparable or better performance.
Contribution
The paper proposes external attention, a new attention mechanism with linear complexity, replacing self-attention in architectures and enabling an all-MLP model for image classification.
Findings
External attention achieves comparable or superior results to self-attention.
It significantly reduces computational and memory costs.
Effective across multiple visual tasks including classification and segmentation.
Abstract
Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This paper proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
