Unlocking the Power of SAM 2 for Few-Shot Segmentation
Qianxiong Xu, Lanyun Zhu, Xuanyi Liu, Guosheng Lin, Cheng Long, Ziyue Li, Rui Zhao

TL;DR
This paper leverages SAM 2's video segmentation capabilities to improve few-shot segmentation by designing a pseudo prompt generator, iterative memory refinement, and support-calibrated attention, achieving significant performance gains.
Contribution
It introduces a novel framework that adapts SAM 2 for FSS by addressing identity mismatch and memory accuracy issues with new modules, enhancing segmentation performance.
Findings
1-shot mIoU improved by 4.2% over baseline
Effective handling of query background features
Validated on PASCAL-5i and COCO-20i datasets
Abstract
Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recently, SAM 2 has extended SAM by supporting video segmentation, whose class-agnostic matching ability is useful to FSS. A simple idea is to encode support foreground (FG) features as memory, with which query FG features are matched and fused. Unfortunately, the FG objects in different frames of SAM 2's video data are always the same identity, while those in FSS are different identities, i.e., the matching step is incompatible. Therefore, we design Pseudo Prompt Generator to encode pseudo query memory, matching with query features in a compatible way. However, the memories can never be as accurate as the real ones,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Detection and Scintillator Technologies · Geophysical Methods and Applications · Medical Imaging Techniques and Applications
MethodsSoftmax · Attention Is All You Need · Segment Anything Model
