TL;DR
This paper demonstrates that a training-free approach using the SAM3 foundation model with spatial concatenation achieves state-of-the-art results in Few-Shot Semantic Segmentation, revealing insights about prompt usage.
Contribution
It introduces a simple, training-free method leveraging SAM3's capabilities for FSS, outperforming complex trained models and analyzing prompt effects.
Findings
SAM3-based method achieves state-of-the-art FSS performance.
Negative prompts can weaken target representations in few-shot settings.
Simple spatial concatenation can enable strong cross-image reasoning.
Abstract
Few-Shot Semantic Segmentation (FSS) focuses on segmenting novel object categories from only a handful of annotated examples. Most existing approaches rely on extensive episodic training to learn transferable representations, which is both computationally demanding and sensitive to distribution shifts. In this work, we revisit FSS from the perspective of modern vision foundation models and explore the potential of Segment Anything Model 3 (SAM3) as a training-free solution. By repurposing its Promptable Concept Segmentation (PCS) capability, we adopt a simple spatial concatenation strategy that places support and query images into a shared canvas, allowing a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes. Experiments on PASCAL- and COCO- show that this minimal design already achieves state-of-the-art performance, outperforming many…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
