Context-aware Attentional Pooling (CAP) for Fine-grained Visual   Classification

Ardhendu Behera; Zachary Wharton; Pradeep Hewage; Asish Bera

arXiv:2101.06635·cs.CV·January 19, 2021

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Ardhendu Behera, Zachary Wharton, Pradeep Hewage, Asish Bera

PDF

1 Repo 1 Video

TL;DR

This paper introduces Context-aware Attentional Pooling (CAP), a novel method that enhances fine-grained visual classification by capturing subtle variations without needing detailed annotations, and demonstrates superior performance across multiple datasets.

Contribution

The paper proposes a new context-aware attentional pooling technique that captures subtle discriminative features and encodes semantic correlations, improving fine-grained classification accuracy.

Findings

01

Outperforms state-of-the-art methods on six benchmark datasets.

02

Effectively captures subtle variations without bounding-box annotations.

03

Compatible with multiple backbone networks.

Abstract

Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ArdhenduBehera/cap
tf

Videos

Context-Aware Attentional Pooling (CAP) for Fine-Grained Visual Classification· underline