Multi-label Image Recognition by Recurrently Discovering Attentional Regions
Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, Liang Lin

TL;DR
This paper introduces a recurrent memorized-attention deep architecture for multi-label image recognition that locates attentional regions without region proposals and captures global dependencies, achieving superior accuracy and efficiency.
Contribution
It presents a novel region-proposal-free attention mechanism using spatial transformers and LSTM to improve multi-label image classification.
Findings
Outperforms existing methods on MS-COCO and PASCAL VOC 07 datasets.
Achieves higher accuracy in multi-label recognition tasks.
Demonstrates improved computational efficiency.
Abstract
This paper proposes a novel deep architecture to address multi-label image recognition, a fundamental and practical task towards general visual understanding. Current solutions for this task usually rely on an extra step of extracting hypothesis regions (i.e., region proposals), resulting in redundant computation and sub-optimal performance. In this work, we achieve the interpretable and contextualized multi-label image classification by developing a recurrent memorized-attention module. This module consists of two alternately performed components: i) a spatial transformer layer to locate attentional regions from the convolutional feature maps in a region-proposal-free way and ii) an LSTM (Long-Short Term Memory) sub-network to sequentially predict semantic labeling scores on the located regions while capturing the global dependencies of these regions. The LSTM also output the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Spatial Transformer · Sigmoid Activation · Tanh Activation · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing
