AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot   Semantic Segmentation

Jiaqi Ma; Guo-Sen Xie; Fang Zhao; Zechao Li

arXiv:2412.17601·cs.CV·December 30, 2024

AFANet: Adaptive Frequency-Aware Network for Weakly-Supervised Few-Shot Semantic Segmentation

Jiaqi Ma, Guo-Sen Xie, Fang Zhao, Zechao Li

PDF

Open Access 1 Repo

TL;DR

AFANet introduces an adaptive frequency-aware approach for weakly-supervised few-shot semantic segmentation, leveraging frequency decoupling and cross-modal guidance to improve segmentation accuracy with minimal annotations.

Contribution

The paper proposes a novel frequency-aware module and a CLIP-guided spatial adapter for enhanced weakly-supervised few-shot segmentation, integrating frequency and cross-modal information online.

Findings

01

Achieves state-of-the-art results on Pascal-5i and COCO-20i datasets.

02

Effectively decouples high- and low-frequency information for better segmentation.

03

Utilizes online CLIP-guided adaptation for enriched semantic understanding.

Abstract

Few-shot learning aims to recognize novel concepts by leveraging prior knowledge learned from a few samples. However, for visually intensive tasks such as few-shot semantic segmentation, pixel-level annotations are time-consuming and costly. Therefore, in this paper, we utilize the more challenging image-level annotations and propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation (WFSS). Specifically, we first propose a cross-granularity frequency-aware module (CFM) that decouples RGB images into high-frequency and low-frequency distributions and further optimizes semantic structural information by realigning them. Unlike most existing WFSS methods using the textual information from the multi-modal language-vision model, e.g., CLIP, in an offline learning manner, we further propose a CLIP-guided spatial-adapter module (CSM), which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jarch-ma/AFANet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Geophysical Methods and Applications · Image Processing and 3D Reconstruction

MethodsContrastive Language-Image Pre-training