Deep Attentional Structured Representation Learning for Visual Recognition
Krishna Kanth Nakka, Mathieu Salzmann

TL;DR
This paper introduces an attention-based deep structured representation learning framework that selectively focuses on discriminative image regions, improving visual recognition performance without extra supervision.
Contribution
It proposes a novel end-to-end attention mechanism integrated into structured representation learning, enhancing discriminative feature aggregation for complex recognition tasks.
Findings
Outperforms attention-less methods on benchmark datasets
Achieves state-of-the-art results in scene recognition
Effectively predicts attention maps without additional supervision
Abstract
Structured representations, such as Bags of Words, VLAD and Fisher Vectors, have proven highly effective to tackle complex visual recognition tasks. As such, they have recently been incorporated into deep architectures. However, while effective, the resulting deep structured representation learning strategies typically aggregate local features from the entire image, ignoring the fact that, in complex recognition tasks, some regions provide much more discriminative information than others. In this paper, we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process. Our framework learns to predict jointly the image class label and an attention map in an end-to-end fashion and without any other supervision than the target label. As evidenced by our experiments, this consistently outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
