Visual and Textual Prior Guided Mask Assemble for Few-Shot Segmentation and Beyond
Chen Shuai, Meng Fanman, Zhang Runtong, Qiu Heqian, Li Hongliang, Wu, Qingbo, Xu Linfeng

TL;DR
This paper introduces PGMA-Net, a novel few-shot segmentation model that leverages visual and textual priors with a class-agnostic mask assembly process, achieving state-of-the-art results and versatility across multiple segmentation tasks.
Contribution
The paper proposes a class-agnostic mask assembly network with diverse, plug-and-play interactions, enabling improved generalization and multi-task capabilities without extra re-training.
Findings
Achieves state-of-the-art mIoU of 77.6 on PASCAL-5^i in 1-shot.
Effective in cross-domain and zero-shot segmentation tasks.
Operates without class-specific information or additional training.
Abstract
Few-shot segmentation (FSS) aims to segment the novel classes with a few annotated images. Due to CLIP's advantages of aligning visual and textual information, the integration of CLIP can enhance the generalization ability of FSS model. However, even with the CLIP model, the existing CLIP-based FSS methods are still subject to the biased prediction towards base classes, which is caused by the class-specific feature level interactions. To solve this issue, we propose a visual and textual Prior Guided Mask Assemble Network (PGMA-Net). It employs a class-agnostic mask assembly process to alleviate the bias, and formulates diverse tasks into a unified manner by assembling the prior through affinity. Specifically, the class-relevant textual and visual features are first transformed to class-agnostic prior in the form of probability map. Then, a Prior-Guided Mask Assemble Module (PGMAM)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Image Processing Techniques and Applications · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training · Balanced Selection
