Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification
Qin Xu, Lili Zhu, Xiaoxia Cheng, Bo Jiang

TL;DR
This paper introduces SCOPE, a novel adaptive spatial decomposition method that enhances fine-grained visual classification by dynamically capturing subtle details and semantic features, surpassing fixed frequency-based approaches.
Contribution
The paper proposes SCOPE, a new adaptive spatial decomposition framework with modules for detail enhancement and semantic refinement, improving flexibility over traditional frequency domain methods.
Findings
Achieves state-of-the-art results on four FGVC benchmarks.
Effectively captures subtle visual cues and semantic information.
Outperforms fixed basis frequency methods.
Abstract
The crux of resolving fine-grained visual classification (FGVC) lies in capturing discriminative and class-specific cues that correspond to subtle visual characteristics. Recently, frequency decomposition/transform based approaches have attracted considerable interests since its appearing discriminative cue mining ability. However, the frequency-domain methods are based on fixed basis functions, lacking adaptability to image content and unable to dynamically adjust feature extraction according to the discriminative requirements of different images. To address this, we propose a novel method for FGVC, named Subtle-Cue Oriented Perception Engine (SCOPE), which adaptively enhances the representational capability of low-level details and high-level semantics in the spatial domain, breaking through the limitations of fixed scales in the frequency domain and improving the flexibility of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Face Recognition and Perception
