# AttentionRNN: A Structured Spatial Attention Mechanism

**Authors:** Siddhesh Khandelwal, Leonid Sigal

arXiv: 1905.09400 · 2019-05-24

## TL;DR

AttentionRNN introduces a novel structured spatial attention mechanism that explicitly models dependencies among attention variables, improving performance across multiple vision and multi-modal tasks.

## Contribution

It proposes an end-to-end trainable attention layer that enforces structural dependencies via sequential prediction, enhancing attention mask consistency.

## Key findings

- Improves recognition accuracy on image categorization.
- Enhances performance in question answering tasks.
- Achieves better results in image generation.

## Abstract

Visual attention mechanisms have proven to be integrally important constituent components of many modern deep neural architectures. They provide an efficient and effective way to utilize visual information selectively, which has shown to be especially valuable in multi-modal learning tasks. However, all prior attention frameworks lack the ability to explicitly model structural dependencies among attention variables, making it difficult to predict consistent attention masks. In this paper we develop a novel structured spatial attention mechanism which is end-to-end trainable and can be integrated with any feed-forward convolutional neural network. This proposed AttentionRNN layer explicitly enforces structure over the spatial attention variables by sequentially predicting attention values in the spatial mask in a bi-directional raster-scan and inverse raster-scan order. As a result, each attention value depends not only on local image or contextual information, but also on the previously predicted attention values. Our experiments show consistent quantitative and qualitative improvements on a variety of recognition tasks and datasets; including image categorization, question answering and image generation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09400/full.md

## Figures

56 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09400/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1905.09400/full.md

---
Source: https://tomesphere.com/paper/1905.09400