Context-Semantic Quality Awareness Network for Fine-Grained Visual   Categorization

Qin Xu; Sitong Li; Jiahui Wang; Bo Jiang; Jinhui Tang

arXiv:2403.10298·cs.CV·March 18, 2024·2 cites

Context-Semantic Quality Awareness Network for Fine-Grained Visual Categorization

Qin Xu, Sitong Li, Jiahui Wang, Bo Jiang, Jinhui Tang

PDF

Open Access

TL;DR

This paper introduces CSQA-Net, a novel weakly supervised network for fine-grained visual categorization that enhances discriminative feature extraction by modeling spatial context and semantic quality, leading to improved performance.

Contribution

The paper proposes a new multi-part, multi-scale cross-attention module and a semantic quality evaluation module to better capture and supervise discriminative features in FGVC.

Findings

01

CSQA-Net outperforms state-of-the-art methods on four FGVC datasets.

02

The MPMSCA module effectively models spatial relationships for finer details.

03

The MLSQE module improves hierarchical semantic supervision.

Abstract

Exploring and mining subtle yet distinctive features between sub-categories with similar appearances is crucial for fine-grained visual categorization (FGVC). However, less effort has been devoted to assessing the quality of extracted visual representations. Intuitively, the network may struggle to capture discriminative features from low-quality samples, which leads to a significant decline in FGVC performance. To tackle this challenge, we propose a weakly supervised Context-Semantic Quality Awareness Network (CSQA-Net) for FGVC. In this network, to model the spatial contextual relationship between rich part descriptors and global semantics for capturing more discriminative details within the object, we design a novel multi-part and multi-scale cross-attention (MPMSCA) module. Before feeding to the MPMSCA module, the part navigator is developed to address the scale confusion problems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Visual Attention and Saliency Detection