Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale   Approach

Mir Rayat Imtiaz Hossain; Mennatullah Siam; Leonid Sigal; James J.; Little

arXiv:2404.11732·cs.CV·April 19, 2024·1 cites

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal, James J., Little

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-scale visual prompting method for generalized few-shot semantic segmentation using transformer decoders, achieving state-of-the-art results without test-time optimization by learning prompts and employing a causal attention mechanism.

Contribution

It proposes a novel visual prompting approach with a unidirectional causal attention mechanism for dense segmentation, improving performance on both novel and base categories in few-shot settings.

Findings

01

State-of-the-art GFSS performance on COCO-20i and Pascal-5i datasets.

02

Effective use of learned visual prompts without test-time optimization.

03

Transductive prompt tuning further enhances segmentation accuracy.

Abstract

The emergence of attention-based transformer models has led to their extensive use in various tasks, due to their superior generalization and transfer properties. Recent research has demonstrated that such models, when prompted appropriately, are excellent for few-shot inference. However, such techniques are under-explored for dense prediction tasks like semantic segmentation. In this work, we examine the effectiveness of prompting a transformer-decoder with learned visual prompts for the generalized few-shot segmentation (GFSS) task. Our goal is to achieve strong performance not only on novel categories with limited examples, but also to retain performance on base categories. We propose an approach to learn visual prompts with limited examples. These learned visual prompts are used to prompt a multiscale transformer decoder to facilitate accurate dense predictions. Additionally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rayat137/VisualPromptGFSS
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques

MethodsBalanced Selection