Segment Everything Everywhere All at Once
Xueyan Zou, Jianwei Yang, Hao Zhang, Feng Li, Linjie Li, Jianfeng, Wang, Lijuan Wang, Jianfeng Gao, Yong Jae Lee

TL;DR
SEEM is a versatile, promptable, and interactive universal segmentation model that unifies various segmentation tasks through a novel decoding mechanism and semantic-aware prompts, demonstrating broad applicability and strong generalization.
Contribution
The paper introduces SEEM, a universal segmentation model with a new decoding mechanism and visual-semantic prompt integration, enabling diverse and interactive segmentation tasks.
Findings
Achieves competitive performance across multiple segmentation datasets.
Demonstrates strong generalization to novel prompts and combinations.
Operates effectively with minimal supervision.
Abstract
In this work, we present SEEM, a promptable and interactive model for segmenting everything everywhere all at once in an image, as shown in Fig.1. In SEEM, we propose a novel decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal segmentation interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks; iii) Interactivity. We further incorporate learnable memory prompts into the decoder to retain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
