ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts

Xiaoqi Wang; Clint Sebastian; Wenbin He; Liu Ren

arXiv:2506.21835·cs.CV·August 5, 2025

ProSAM: Enhancing the Robustness of SAM-based Visual Reference Segmentation with Probabilistic Prompts

Xiaoqi Wang, Clint Sebastian, Wenbin He, Liu Ren

PDF

Open Access

TL;DR

ProSAM introduces a probabilistic prompt encoder for SAM-based visual segmentation, significantly improving robustness and stability by avoiding boundary prompts, and outperforms existing methods on standard datasets.

Contribution

It proposes a variational prompt encoder that predicts prompt distributions, enhancing robustness in SAM-based visual reference segmentation.

Findings

01

Outperforms state-of-the-art on Pascal-5i and COCO-20i datasets

02

Achieves more stable and robust segmentation results

03

Addresses prompt boundary instability in existing methods

Abstract

The recent advancements in large foundation models have driven the success of open-set image segmentation, a task focused on segmenting objects beyond predefined categories. Among various prompt types (such as points, boxes, texts, and visual references), visual reference segmentation stands out for its unique flexibility and strong zero-shot capabilities. Recently, several SAM-based methods have made notable progress in this task by automatically generating prompts to guide SAM. However, these methods often generate prompts at boundaries of target regions due to suboptimal prompt encoder, which results in instability and reduced robustness. In this work, we introduce ProSAM, a simple but effective method to address the stability challenges we identified in existing SAM-based visual reference segmentation approaches. By learning a variational prompt encoder to predict multivariate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsSegment Anything Model