Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification

Qin Xu; Lili Zhu; Xiaoxia Cheng; Bo Jiang

arXiv:2508.06959·cs.CV·November 14, 2025

Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification

Qin Xu, Lili Zhu, Xiaoxia Cheng, Bo Jiang

PDF

Open Access

TL;DR

This paper introduces SCOPE, a novel adaptive spatial decomposition method that enhances fine-grained visual classification by dynamically capturing subtle details and semantic features, surpassing fixed frequency-based approaches.

Contribution

The paper proposes SCOPE, a new adaptive spatial decomposition framework with modules for detail enhancement and semantic refinement, improving flexibility over traditional frequency domain methods.

Findings

01

Achieves state-of-the-art results on four FGVC benchmarks.

02

Effectively captures subtle visual cues and semantic information.

03

Outperforms fixed basis frequency methods.

Abstract

The crux of resolving fine-grained visual classification (FGVC) lies in capturing discriminative and class-specific cues that correspond to subtle visual characteristics. Recently, frequency decomposition/transform based approaches have attracted considerable interests since its appearing discriminative cue mining ability. However, the frequency-domain methods are based on fixed basis functions, lacking adaptability to image content and unable to dynamically adjust feature extraction according to the discriminative requirements of different images. To address this, we propose a novel method for FGVC, named Subtle-Cue Oriented Perception Engine (SCOPE), which adaptively enhances the representational capability of low-level details and high-level semantics in the spatial domain, breaking through the limitations of fixed scales in the frequency domain and improving the flexibility of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Visual Attention and Saliency Detection · Face Recognition and Perception