Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features
Dongliang Chang, Yixiao Zheng, Zhanyu Ma, Ruoyi Du, Kongming Liang

TL;DR
This paper introduces a novel loss function, TDSA-Loss, that enables the simultaneous learning of multi-regional and multi-grained features for improved fine-grained visual classification, avoiding complex model structures.
Contribution
The paper proposes TDSA-Loss with multi-stage channel constraints and top-down spatial attention to effectively mine discriminative regions and details in fine-grained images.
Findings
Significant improvement on four fine-grained datasets.
Effective multi-regional and multi-grained feature extraction.
Modules outperform baseline methods in ablation studies.
Abstract
Fine-grained visual classification is a challenging task that recognizes the sub-classes belonging to the same meta-class. Large inter-class similarity and intra-class variance is the main challenge of this task. Most exiting methods try to solve this problem by designing complex model structures to explore more minute and discriminative regions. In this paper, we argue that mining multi-regional multi-grained features is precisely the key to this task. Specifically, we introduce a new loss function, termed top-down spatial attention loss (TDSA-Loss), which contains a multi-stage channel constrained module and a top-down spatial attention module. The multi-stage channel constrained module aims to make the feature channels in different stages category-aligned. Meanwhile, the top-down spatial attention module uses the attention map generated by high-level aligned feature channels to make…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Advanced Neural Network Applications
MethodsSigmoid Activation · Average Pooling · Max Pooling · Convolution · Communication--Guide||How Do I Communicate to Expedia?
