Fine-Grained Visual Classification via Simultaneously Learning of   Multi-regional Multi-grained Features

Dongliang Chang; Yixiao Zheng; Zhanyu Ma; Ruoyi Du; Kongming Liang

arXiv:2102.00367·cs.CV·February 2, 2021·1 cites

Fine-Grained Visual Classification via Simultaneously Learning of Multi-regional Multi-grained Features

Dongliang Chang, Yixiao Zheng, Zhanyu Ma, Ruoyi Du, Kongming Liang

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel loss function, TDSA-Loss, that enables the simultaneous learning of multi-regional and multi-grained features for improved fine-grained visual classification, avoiding complex model structures.

Contribution

The paper proposes TDSA-Loss with multi-stage channel constraints and top-down spatial attention to effectively mine discriminative regions and details in fine-grained images.

Findings

01

Significant improvement on four fine-grained datasets.

02

Effective multi-regional and multi-grained feature extraction.

03

Modules outperform baseline methods in ablation studies.

Abstract

Fine-grained visual classification is a challenging task that recognizes the sub-classes belonging to the same meta-class. Large inter-class similarity and intra-class variance is the main challenge of this task. Most exiting methods try to solve this problem by designing complex model structures to explore more minute and discriminative regions. In this paper, we argue that mining multi-regional multi-grained features is precisely the key to this task. Specifically, we introduce a new loss function, termed top-down spatial attention loss (TDSA-Loss), which contains a multi-stage channel constrained module and a top-down spatial attention module. The multi-stage channel constrained module aims to make the feature channels in different stages category-aligned. Meanwhile, the top-down spatial attention module uses the attention map generated by high-level aligned feature channels to make…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection · Advanced Neural Network Applications

MethodsSigmoid Activation · Average Pooling · Max Pooling · Convolution · Communication--Guide||How Do I Communicate to Expedia?