Multi-label Image Recognition by Recurrently Discovering Attentional   Regions

Zhouxia Wang; Tianshui Chen; Guanbin Li; Ruijia Xu; Liang Lin

arXiv:1711.02816·cs.CV·November 9, 2017·53 cites

Multi-label Image Recognition by Recurrently Discovering Attentional Regions

Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

PDF

Open Access

TL;DR

This paper introduces a recurrent memorized-attention deep architecture for multi-label image recognition that locates attentional regions without region proposals and captures global dependencies, achieving superior accuracy and efficiency.

Contribution

It presents a novel region-proposal-free attention mechanism using spatial transformers and LSTM to improve multi-label image classification.

Findings

01

Outperforms existing methods on MS-COCO and PASCAL VOC 07 datasets.

02

Achieves higher accuracy in multi-label recognition tasks.

03

Demonstrates improved computational efficiency.

Abstract

This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Spatial Transformer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing