Personalize Segment Anything Model with One Shot

Renrui Zhang; Zhengkai Jiang; Ziyu Guo; Shilin Yan; Junting Pan,; Xianzheng Ma; Hao Dong; Peng Gao; Hongsheng Li

arXiv:2305.03048·cs.CV·October 5, 2023·65 cites

Personalize Segment Anything Model with One Shot

Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junting Pan,, Xianzheng Ma, Hao Dong, Peng Gao, Hongsheng Li

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces PerSAM, a training-free method to personalize the Segment Anything Model for specific visual concepts using only one image, with an optional quick fine-tuning step for improved accuracy, demonstrated on a new dataset and applications.

Contribution

The paper proposes PerSAM, a novel training-free personalization approach for SAM, and PerSAM-F, a one-shot fine-tuning method that requires minimal training time.

Findings

01

Effective personalization of SAM with a single image.

02

Competitive performance on video object segmentation.

03

Enhanced text-to-image generation with personalized models.

Abstract

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the generality, customizing SAM for specific visual concepts without man-powered prompting is under explored, e.g., automatically segmenting your pet dog in different images. In this paper, we propose a training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM first localizes the target concept by a location prior, and segments it within other images or videos via three techniques: target-guided attention, target-semantic prompting, and cascaded post-refinement. In this way, we effectively adapt SAM for private use without any training. To further alleviate the mask ambiguity, we present an efficient one-shot fine-tuning variant, PerSAM-F. Freezing the…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The problem of single-shot image segmentation is an important problem to solve. This has many downstream utilities in real-world applications ranging from design to healthcare. And the paper introduces a simple but effective technique to solve this by leveraging the powerful Segment Anything Module (SAM) [1]. The introduced method is called Personalization approach for SAM (PerSAM), and it takes as input a single example image of the desired object we want to segment, and its corresponding seg

Weaknesses

Overall it is a nicely written paper, with good results. However, it is somewhat lacking in it's quantitative evaluation. The choice of evaluation datasets is limited. It would be worthwhile to also see the performance of the proposed method for one-shot segmentation on additional (more challenging) datasets like- MS-COCO, AED20K, CityScapes to also compare with more powerful existing state of the art models. Also the comparison is lacking. It would be nice to compare against methods that do

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 5

Strengths

1. The paper is well organised with clear motivation and easy to understand. The illustration and visualisation figures are well presented. 2. PerSAM is training-free and computationally efficient, where the ablation experiment for PerSAM in Table 4, 5 and 6 are extensive. 3. The paper demonstrates good performance not only on the constructed PerSeg benchmark, but also on many image/video segmentation benchmarks.

Weaknesses

1. In the appendix, the author mentioned using dinov2 features. Can the authors also provide the results in Table 2 and 3 by using the default image encoder features of SAM? 2. What is the running speed/ memory consumption of PerSAM comparing to SAM? 3. In Table 2, can the author provide performance comparison to SAM-PT [a]? [a] is a related work in adapting SAM for video object segmentation. [a] SAM-PT: Extending SAM to zero-shot video segmentation with point-based tracking. arXiv, 2023. 4.

Reviewer 03Rating 8· accept, good paperConfidence 5

Strengths

* This paper is well-written and easy to understand. * This paper first studies an interesting task of customizing a general-purpose segmentation model for personalized scenarios. And the paper presents a highly effective method to address this task. * The method is simple and easy to follow. The proposed PerSAM can guide SAM to segment target objects by three effective training-free techniques. By tuning 2 parameters within 10 seconds, PerSAM-F efficiently alleviates the mask ambiguity issue

Weaknesses

The feature semantics of SAM might be limited due to SAM's class-agnostic training. While PerSAM and PerSAM-F demonstrate promising performance in personalized object segmentation, their effectiveness may be constrained by SAM's feature semantics in scenarios involving multiple different objects. This may require additional training to enable better transfer of SAM's features to downstream tasks. Alternatively, introducing other representations with stronger semantics, such as CLIP.

Code & Models

Repositories

zrrskywalker/personalize-sam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSegment Anything Model · Test · Diffusion