Multi-Text Guided Few-Shot Semantic Segmentation
Qiang Jiao, Bin Yan, Yi Yang, Mengrui Shi, Qiang Zhang

TL;DR
This paper introduces MTGNet, a dual-branch framework that fuses multiple textual prompts and enhances cross-modal interaction to improve few-shot semantic segmentation, especially for complex categories with high intra-class variation.
Contribution
The paper proposes a novel multi-text guided framework with modules for textual prior refinement, semantic anchor fusion, and visual prior enhancement, addressing limitations of single-prompt methods.
Findings
Achieves 76.8% mIoU on PASCAL-5i in 1-shot setting.
Attains 57.4% mIoU on COCO-20i in 1-shot setting.
Shows significant improvements on categories with high intra-class variation.
Abstract
Recent CLIP-based few-shot semantic segmentation methods introduce class-level textual priors to assist segmentation by typically using a single prompt (e.g., a photo of class). However, these approaches often result in incomplete activation of target regions, as a single textual description cannot fully capture the semantic diversity of complex categories. Moreover, they lack explicit cross-modal interaction and are vulnerable to noisy support features, further degrading visual prior quality. To address these issues, we propose the Multi-Text Guided Few-Shot Semantic Segmentation Network (MTGNet), a dual-branch framework that enhances segmentation performance by fusing diverse textual prompts to refine textual priors and guide the cross-modal optimization of visual priors. Specifically, we design a Multi-Textual Prior Refinement (MTPR) module that suppresses interference and aggregates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications
