Global Self-Attention Networks for Image Recognition

Zhuoran Shen; Irwan Bello; Raviteja Vemulapalli; Xuhui Jia; Ching-Hui; Chen

arXiv:2010.03019·cs.CV·October 15, 2020·20 cites

Global Self-Attention Networks for Image Recognition

Zhuoran Shen, Irwan Bello, Raviteja Vemulapalli, Xuhui Jia, Ching-Hui, Chen

PDF

Open Access

TL;DR

This paper introduces a novel global self-attention module, GSA, that efficiently models long-range pixel interactions in image recognition, outperforming convolutional and existing attention-based networks on CIFAR-100 and ImageNet.

Contribution

The paper presents the GSA module, enabling deep networks to incorporate global self-attention efficiently, surpassing prior local or limited attention methods in performance.

Findings

01

GSA networks outperform convolutional networks on CIFAR-100 and ImageNet.

02

GSA modules use fewer parameters and less computation.

03

GSA networks outperform existing attention-based models on ImageNet.

Abstract

Recently, a series of works in computer vision have shown promising results on various image and video understanding tasks using self-attention. However, due to the quadratic computational and memory complexities of self-attention, these works either apply attention only to low-resolution feature maps in later stages of a deep network or restrict the receptive field of attention in each layer to a small local region. To overcome these limitations, this work introduces a new global self-attention module, referred to as the GSA module, which is efficient enough to serve as the backbone component of a deep network. This module consists of two parallel layers: a content attention layer that attends to pixels based only on their content and a positional attention layer that attends to pixels based on their spatial locations. The output of this module is the sum of the outputs of the two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection