ROSE: Retrieval-Oriented Segmentation Enhancement

Song Tang; Guangquan Jie; Henghui Ding; Yu-Gang Jiang

arXiv:2604.14147·cs.CV·April 16, 2026

ROSE: Retrieval-Oriented Segmentation Enhancement

Song Tang, Guangquan Jie, Henghui Ding, Yu-Gang Jiang

PDF

TL;DR

This paper introduces ROSE, a retrieval-oriented framework that enhances multimodal large language models for better segmentation of novel and emerging entities by integrating real-time web information and visual data.

Contribution

The paper proposes ROSE, a novel plug-and-play framework with retrieval and prompt enhancement modules to improve segmentation of new entities in MLLMs, and creates the NEST benchmark for evaluation.

Findings

01

ROSE outperforms baseline by 19.2 gIoU on NEST benchmark.

02

The framework effectively incorporates real-time web info and images.

03

ROSE significantly improves segmentation of emerging entities.

Abstract

Existing segmentation models based on multimodal large language models (MLLMs), such as LISA, often struggle with novel or emerging entities due to their inability to incorporate up-to-date knowledge. To address this challenge, we introduce the Novel Emerging Segmentation Task (NEST), which focuses on segmenting (i) novel entities that MLLMs fail to recognize due to their absence from training data, and (ii) emerging entities that exist within the model's knowledge but demand up-to-date external information for accurate recognition. To support the study of NEST, we construct a NEST benchmark using an automated pipeline that generates news-related data samples for comprehensive evaluation. Additionally, we propose ROSE: Retrieval-Oriented Segmentation Enhancement, a plug-and-play framework designed to augment any MLLM-based segmentation model. ROSE comprises four key components. First,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.