Deep Attentional Structured Representation Learning for Visual   Recognition

Krishna Kanth Nakka; Mathieu Salzmann

arXiv:1805.05389·cs.CV·May 16, 2018·5 cites

Deep Attentional Structured Representation Learning for Visual Recognition

Krishna Kanth Nakka, Mathieu Salzmann

PDF

Open Access 1 Repo

TL;DR

This paper introduces an attention-based deep structured representation learning framework that selectively focuses on discriminative image regions, improving visual recognition performance without extra supervision.

Contribution

It proposes a novel end-to-end attention mechanism integrated into structured representation learning, enhancing discriminative feature aggregation for complex recognition tasks.

Findings

01

Outperforms attention-less methods on benchmark datasets

02

Achieves state-of-the-art results in scene recognition

03

Effectively predicts attention maps without additional supervision

Abstract

Structured representations, such as Bags of Words, VLAD and Fisher Vectors, have proven highly effective to tackle complex visual recognition tasks. As such, they have recently been incorporated into deep architectures. However, while effective, the resulting deep structured representation learning strategies typically aggregate local features from the entire image, ignoring the fact that, in complex recognition tasks, some regions provide much more discriminative information than others. In this paper, we introduce an attentional structured representation learning framework that incorporates an image-specific attention mechanism within the aggregation process. Our framework learns to predict jointly the image class label and an attention map in an end-to-end fashion and without any other supervision than the target label. As evidenced by our experiments, this consistently outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahmedest61/vlad-buff
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications