An attention-driven hierarchical multi-scale representation for visual   recognition

Zachary Wharton; Ardhendu Behera; Asish Bera

arXiv:2110.12178·cs.CV·October 26, 2021

An attention-driven hierarchical multi-scale representation for visual recognition

Zachary Wharton, Ardhendu Behera, Asish Bera

PDF

Open Access

TL;DR

This paper introduces an attention-driven hierarchical multi-scale representation using GCNs to capture long-range dependencies in images, significantly improving fine-grained and generic visual recognition tasks.

Contribution

It proposes a novel GCN-based method with attention-driven message propagation for hierarchical multi-scale feature modeling in visual recognition.

Findings

01

Outperforms state-of-the-art on three datasets

02

Effective for fine-grained classification

03

Competitive on additional datasets

Abstract

Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content. This is mainly due to their ability to break down an image into smaller pieces, extract multi-scale localized features and compose them to construct highly expressive representations for decision making. However, the convolution operation is unable to capture long-range dependencies such as arbitrary relations between pixels since it operates on a fixed-size window. Therefore, it may not be suitable for discriminating subtle changes (e.g. fine-grained visual recognition). To this end, our proposed method captures the high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs), which aggregate information by establishing relationships among multi-scale hierarchical regions. These regions consist of smaller (closer look) to larger (far look), and the dependency between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Domain Adaptation and Few-Shot Learning

MethodsConvolution