Diversified Visual Attention Networks for Fine-Grained Object Classification
Bo Zhao, Xiao Wu, Jiashi Feng, Qiang Peng, Shuicheng Yan

TL;DR
This paper introduces a diversified visual attention network (DVAN) that improves fine-grained object classification by explicitly promoting attention diversity, reducing reliance on strong supervision, and effectively capturing discriminative features across multiple scales.
Contribution
The paper proposes a novel DVAN model that explicitly encourages attention diversity and leverages multiple attention canvases with an LSTM to enhance fine-grained classification performance.
Findings
Achieves competitive results on CUB-2011, Stanford Dogs, and Stanford Cars datasets.
Reduces dependency on strongly-supervised localization information.
Demonstrates the effectiveness of attention diversity in fine-grained classification.
Abstract
Fine-grained object classification is a challenging task due to the subtle inter-class difference and large intra-class variation. Recently, visual attention models have been applied to automatically localize the discriminative regions of an image for better capturing critical difference and demonstrated promising performance. However, without consideration of the diversity in attention process, most of existing attention models perform poorly in classifying fine-grained objects. In this paper, we propose a diversified visual attention network (DVAN) to address the problems of fine-grained object classification, which substan- tially relieves the dependency on strongly-supervised information for learning to localize discriminative regions compared with attentionless models. More importantly, DVAN explicitly pursues the diversity of attention and is able to gather discriminative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
