Segment Everything Everywhere All at Once

Xueyan Zou; Jianwei Yang; Hao Zhang; Feng Li; Linjie Li; Jianfeng; Wang; Lijuan Wang; Jianfeng Gao; Yong Jae Lee

arXiv:2304.06718·cs.CV·July 13, 2023·151 cites

Segment Everything Everywhere All at Once

Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng, Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

PDF

Open Access 3 Repos 3 Models

TL;DR

SEEM is a versatile, promptable, and interactive universal segmentation model that unifies various segmentation tasks through a novel decoding mechanism and semantic-aware prompts, demonstrating broad applicability and strong generalization.

Contribution

The paper introduces SEEM, a universal segmentation model with a new decoding mechanism and visual-semantic prompt integration, enabling diverse and interactive segmentation tasks.

Findings

01

Achieves competitive performance across multiple segmentation datasets.

02

Demonstrates strong generalization to novel prompts and combinations.

03

Operates effectively with minimal supervision.

Abstract

In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown in Fig.1. In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks; iii) Interactivity. We further incorporate learnable memory prompts into the decoder to retain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning