ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise   Perceptual Large Multimodal Model

Kunyang Han; Yibo Hu; Mengxue Qu; Hailin Shi; Yao Zhao; Yunchao Wei

arXiv:2412.00153·cs.CV·March 12, 2025

ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model

Kunyang Han, Yibo Hu, Mengxue Qu, Hailin Shi, Yao Zhao, Yunchao Wei

PDF

Open Access

TL;DR

ROSE introduces a novel open-set dense segmentation model that predicts masks at the patch level and generates categories freely, surpassing limitations of existing models in open environments.

Contribution

The paper presents ROSE, a large multimodal model capable of dense, open-category segmentation with patch-wise perception and a conversation-based refinement paradigm.

Findings

01

ROSE achieves competitive results across multiple segmentation tasks.

02

It enables open-category prediction without predefined labels.

03

The model supports both dense and sparse mask predictions.

Abstract

Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary and free-text segmentation, yet existing models still require predefined category prompts, limiting free-form category self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability in open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation through patch-wise perception. Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously. Additionally, a newly designed instruction-response paradigm takes full advantage of the generation and generalization capabilities of LMMs, achieving category prediction independent of closed-set constraints or predefined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis · Industrial Vision Systems and Defect Detection

MethodsContrastive Language-Image Pre-training