Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization
Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang

TL;DR
This paper introduces CSQA-Net, a novel weakly supervised network for fine-grained visual categorization that enhances discriminative feature extraction by modeling spatial context and semantic quality, leading to improved performance.
Contribution
The paper proposes a new multi-part, multi-scale cross-attention module and a semantic quality evaluation module to better capture and supervise discriminative features in FGVC.
Findings
CSQA-Net outperforms state-of-the-art methods on four FGVC datasets.
The MPMSCA module effectively models spatial relationships for finer details.
The MLSQE module improves hierarchical semantic supervision.
Abstract
Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Visual Attention and Saliency Detection
