OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

Shiting Xiao; Rishabh Kabra; Yuhang Li; Donghyun Lee; Joao Carreira; Priyadarshini Panda

arXiv:2507.05427·cs.CV·February 3, 2026

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts

Shiting Xiao, Rishabh Kabra, Yuhang Li, Donghyun Lee, Joao Carreira, Priyadarshini Panda

PDF

Open Access 1 Video

TL;DR

OpenWorldSAM extends the SAM2 model to open-vocabulary image segmentation by integrating multi-modal embeddings, enabling flexible prompts, efficient training, and strong zero-shot generalization across various segmentation tasks.

Contribution

It introduces a lightweight, prompt-driven framework that combines SAM2 with a vision-language model for universal, open-vocabulary image segmentation with minimal training.

Findings

01

Achieves state-of-the-art results in open-vocabulary segmentation tasks.

02

Supports diverse prompts including category and sentence descriptions.

03

Demonstrates strong zero-shot generalization to unseen categories.

Abstract

The ability to segment objects based on open-ended language prompts remains a critical challenge, requiring models to ground textual semantics into precise spatial masks while handling diverse and unseen categories. We present OpenWorldSAM, a framework that extends the prompt-driven Segment Anything Model v2 (SAM2) to open-vocabulary scenarios by integrating multi-modal embeddings extracted from a lightweight vision-language model (VLM). Our approach is guided by four key principles: i) Unified prompting: OpenWorldSAM supports a diverse range of prompts, including category-level and sentence-level language descriptions, providing a flexible interface for various segmentation tasks. ii) Efficiency: By freezing the pre-trained components of SAM2 and the VLM, we train only 4.5 million parameters on the COCO-stuff dataset, achieving remarkable resource efficiency. iii) Instance Awareness:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning