Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes   Interactively

Haobo Yuan; Xiangtai Li; Chong Zhou; Yining Li; Kai Chen; Chen Change; Loy

arXiv:2401.02955·cs.CV·September 17, 2024·2 cites

Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change, Loy

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Open-Vocabulary SAM, a unified model combining SAM and CLIP for interactive segmentation and recognition of around 22,000 classes, significantly outperforming baseline methods.

Contribution

The paper presents a novel framework integrating SAM and CLIP with knowledge transfer modules, enabling large-scale open-vocabulary segmentation and recognition.

Findings

01

Effective knowledge transfer between SAM and CLIP demonstrated.

02

Achieves recognition of approximately 22,000 classes.

03

Outperforms naive baseline combinations in experiments.

Abstract

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models (VFMs). SAM excels in segmentation tasks across diverse domains, whereas CLIP is renowned for its zero-shot recognition capabilities. This paper presents an in-depth exploration of integrating these two models into a unified framework. Specifically, we introduce the Open-Vocabulary SAM, a SAM-inspired model designed for simultaneous interactive segmentation and recognition, leveraging two unique knowledge transfer modules: SAM2CLIP and CLIP2SAM. The former adapts SAM's knowledge into the CLIP via distillation and learnable transformer adapters, while the latter transfers CLIP knowledge into SAM, enhancing its recognition capabilities. Extensive experiments on various datasets and detectors show the effectiveness of Open-Vocabulary SAM in both segmentation and recognition tasks, significantly outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harboryuan/ovsam
pytorchOfficial

Models

🤗
HarborYuan/ovsam_models
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI

MethodsSegment Anything Model · Contrastive Language-Image Pre-training