AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation
Jiaqi Ma, Guo-Sen Xie, Fang Zhao, Zechao Li

TL;DR
AFANet introduces an adaptive frequency-aware approach for weakly-supervised few-shot semantic segmentation, leveraging frequency decoupling and cross-modal guidance to improve segmentation accuracy with minimal annotations.
Contribution
The paper proposes a novel frequency-aware module and a CLIP-guided spatial adapter for enhanced weakly-supervised few-shot segmentation, integrating frequency and cross-modal information online.
Findings
Achieves state-of-the-art results on Pascal-5i and COCO-20i datasets.
Effectively decouples high- and low-frequency information for better segmentation.
Utilizes online CLIP-guided adaptation for enriched semantic understanding.
Abstract
Few-shot learning aims to recognize novel concepts by leveraging prior knowledge learned from a few samples. However, for visually intensive tasks such as few-shot semantic segmentation, pixel-level annotations are time-consuming and costly. Therefore, in this paper, we utilize the more challenging image-level annotations and propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation (WFSS). Specifically, we first propose a cross-granularity frequency-aware module (CFM) that decouples RGB images into high-frequency and low-frequency distributions and further optimizes semantic structural information by realigning them. Unlike most existing WFSS methods using the textual information from the multi-modal language-vision model, e.g., CLIP, in an offline learning manner, we further propose a CLIP-guided spatial-adapter module (CSM), which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Geophysical Methods and Applications · Image Processing and 3D Reconstruction
MethodsContrastive Language-Image Pre-training
