An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Xizhou Zhu; Dazhi Cheng; Zheng Zhang; Stephen Lin; Jifeng Dai

arXiv:1904.05873·cs.CV·April 15, 2019·103 cites

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai

PDF

Open Access 1 Repo

TL;DR

This paper empirically investigates how various spatial attention components influence deep neural network performance across different models and tasks, revealing insights that challenge conventional beliefs and suggest avenues for improvement.

Contribution

It systematically ablates and compares spatial attention elements in a unified framework, encompassing Transformer, deformable, and dynamic convolutions, providing new understanding of their roles.

Findings

01

Query-key comparison is negligible in self-attention but crucial in encoder-decoder attention.

02

Combining deformable convolution with key-only saliency yields optimal accuracy-efficiency balance.

03

There is significant potential for enhancing attention mechanism designs.

Abstract

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the query and key content comparison in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. A proper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-mmlab/mmdetection
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax