Global Self-Attention Networks for Image Recognition
Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui, Chen

TL;DR
This paper introduces a novel global self-attention module, GSA, that efficiently models long-range pixel interactions in image recognition, outperforming convolutional and existing attention-based networks on CIFAR-100 and ImageNet.
Contribution
The paper presents the GSA module, enabling deep networks to incorporate global self-attention efficiently, surpassing prior local or limited attention methods in performance.
Findings
GSA networks outperform convolutional networks on CIFAR-100 and ImageNet.
GSA modules use fewer parameters and less computation.
GSA networks outperform existing attention-based models on ImageNet.
Abstract
Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention. However, due to the quadratic computational and memory complexities of self-attention, these works either apply attention only to low-resolution feature maps in later stages of a deep network or restrict the receptive field of attention in each layer to a small local region. To overcome these limitations, this work introduces a new global self-attention module, referred to as the GSA module, which is efficient enough to serve as the backbone component of a deep network. This module consists of two parallel layers: a content attention layer that attends to pixels based only on their content and a positional attention layer that attends to pixels based on their spatial locations. The output of this module is the sum of the outputs of the two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
