Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)

Saaketh Koundinya Gundavarapu; Arushi Arora; Shreya Agarwal

arXiv:2405.07284·cs.CV·May 27, 2024·1 cites

Zero Shot Context-Based Object Segmentation using SLIP (SAM+CLIP)

Saaketh Koundinya Gundavarapu, Arushi Arora, Shreya Agarwal

PDF

Open Access 1 Repo

TL;DR

SLIP combines SAM and CLIP to enable zero-shot, context-aware object segmentation based on text prompts without prior class-specific training.

Contribution

The paper introduces SLIP, a novel architecture integrating SAM with CLIP, allowing zero-shot object segmentation using text prompts and fine-tuned image-text representations.

Findings

01

Effective zero-shot segmentation based on textual cues

02

Enhanced versatility and context-awareness in object segmentation

03

Successful integration of CLIP's capabilities into SAM

Abstract

We present SLIP (SAM+CLIP), an enhanced architecture for zero-shot object segmentation. SLIP combines the Segment Anything Model (SAM) \cite{kirillov2023segment} with the Contrastive Language-Image Pretraining (CLIP) \cite{radford2021learning}. By incorporating text prompts into SAM using CLIP, SLIP enables object segmentation without prior training on specific classes or categories. We fine-tune CLIP on a Pokemon dataset, allowing it to learn meaningful image-text representations. SLIP demonstrates the ability to recognize and segment objects in images based on contextual information from text prompts, expanding the capabilities of SAM for versatile object segmentation. Our experiments demonstrate the effectiveness of the SLIP architecture in segmenting objects in images based on textual cues. The integration of CLIP's text-image understanding capabilities into SAM expands the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tommarvoloriddle/SLIP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Medical Image Segmentation Techniques

MethodsContrastive Language-Image Pre-training · Segment Anything Model