Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J., Little

TL;DR
This paper introduces a multi-scale visual prompting method for generalized few-shot semantic segmentation using transformer decoders, achieving state-of-the-art results without test-time optimization by learning prompts and employing a causal attention mechanism.
Contribution
It proposes a novel visual prompting approach with a unidirectional causal attention mechanism for dense segmentation, improving performance on both novel and base categories in few-shot settings.
Findings
State-of-the-art GFSS performance on COCO-20i and Pascal-5i datasets.
Effective use of learned visual prompts without test-time optimization.
Transductive prompt tuning further enhances segmentation accuracy.
Abstract
The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
MethodsBalanced Selection
