ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
Kunyang Han, Yibo Hu, Mengxue Qu, Hailin Shi, Yao Zhao, Yunchao Wei

TL;DR
ROSE introduces a novel open-set dense segmentation model that predicts masks at the patch level and generates categories freely, surpassing limitations of existing models in open environments.
Contribution
The paper presents ROSE, a large multimodal model capable of dense, open-category segmentation with patch-wise perception and a conversation-based refinement paradigm.
Findings
ROSE achieves competitive results across multiple segmentation tasks.
It enables open-category prediction without predefined labels.
The model supports both dense and sparse mask predictions.
Abstract
Advances in CLIP and large multimodal models (LMMs) have enabled open-vocabulary and free-text segmentation, yet existing models still require predefined category prompts, limiting free-form category self-generation. Most segmentation LMMs also remain confined to sparse predictions, restricting their applicability in open-set environments. In contrast, we propose ROSE, a Revolutionary Open-set dense SEgmentation LMM, which enables dense mask prediction and open-category generation through patch-wise perception. Our method treats each image patch as an independent region of interest candidate, enabling the model to predict both dense and sparse masks simultaneously. Additionally, a newly designed instruction-response paradigm takes full advantage of the generation and generalization capabilities of LMMs, achieving category prediction independent of closed-set constraints or predefined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis · Industrial Vision Systems and Defect Detection
MethodsContrastive Language-Image Pre-training
