WOW-Seg: A Word-free Open World Segmentation Model

Danyang Li,Tianhao Wu,Bin Li,Zhenyuan Chen,Yang Zhang,Yuxuan Li,Ming-Ming Cheng,Xiang Li

arXiv:2605.16903·cs.CV·May 19, 2026

WOW-Seg: A Word-free Open World Segmentation Model

Danyang Li,Tianhao Wu,Bin Li,Zhenyuan Chen,Yang Zhang,Yuxuan Li,Ming-Ming Cheng,Xiang Li

PDF

1 Repo 1 Video

TL;DR

WOW-Seg is a novel open world image segmentation model that uses visual prompts and a new dataset to achieve high performance in recognizing and segmenting objects across a vast range of categories.

Contribution

The paper introduces WOW-Seg, a word-free open world segmentation model with a novel visual prompt module and a new large-scale region recognition dataset, improving semantic understanding in open-set scenarios.

Findings

01

Achieves 89.7 semantic similarity and 82.4 semantic IoU on LVIS dataset.

02

Surpasses previous state-of-the-art with only one-eighth the parameters.

03

Constructed the extensive RR-7K dataset with 7,662 classes.

Abstract

Open world image segmentation aims to achieve precise segmentation and semantic understanding of targets within images by addressing the infinitely open set of object categories encountered in the real world. However, traditional closed-set segmentation approaches struggle to adapt to complex open world scenarios, while foundation segmentation models such as SAM exhibit notable discrepancies between their strong segmentation capabilities and relatively weaker semantic understanding. To bridge these discrepancies, we propose WOW-Seg, a Word-free Open World Segmentation model for segmenting and recognizing objects from open-set categories. Specifically, WOW-Seg introduces a novel visual prompt module, Mask2Token, which transforms image masks into visual tokens and ensures their alignment with the VLLM feature space. Moreover, we introduce the Cascade Attention Mask to decouple information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AAwcAA/WOW-Seg-Meta
github

Videos

WOW-Seg: A Word-free Open World Segmentation Model· slideslive