TL;DR
This paper introduces Context-aware Attentional Pooling (CAP), a novel method that enhances fine-grained visual classification by capturing subtle variations without needing detailed annotations, and demonstrates superior performance across multiple datasets.
Contribution
The paper proposes a new context-aware attentional pooling technique that captures subtle discriminative features and encodes semantic correlations, improving fine-grained classification accuracy.
Findings
Outperforms state-of-the-art methods on six benchmark datasets.
Effectively captures subtle variations without bounding-box annotations.
Compatible with multiple backbone networks.
Abstract
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
