Omni-Referring Image Segmentation

Qiancheng Zheng; Yunhang Shen; Gen Luo; Baiyang Song; Xing Sun; Xiaoshuai Sun; Yiyi Zhou; Rongrong Ji

arXiv:2512.06862·cs.CV·December 9, 2025

Omni-Referring Image Segmentation

Qiancheng Zheng, Yunhang Shen, Gen Luo, Baiyang Song, Xing Sun, Xiaoshuai Sun, Yiyi Zhou, Rongrong Ji

PDF

Open Access

TL;DR

This paper introduces Omni-Referring Image Segmentation (OmniRIS), a new task that integrates text and visual prompts for highly flexible and generalized image segmentation, supported by a large dataset and a strong baseline model.

Contribution

The paper defines OmniRIS, creates the OmniRef dataset, and proposes OmniSegNet, advancing multi-modal segmentation with omni-prompts and setting new benchmarks.

Findings

01

OmniSegNet effectively follows omni-modal instructions.

02

OmniRIS outperforms existing segmentation methods.

03

The OmniRef dataset enables comprehensive evaluation.

Abstract

In this paper, we propose a novel task termed Omni-Referring Image Segmentation (OmniRIS) towards highly generalized image segmentation. Compared with existing unimodally conditioned segmentation tasks, such as RIS and visual RIS, OmniRIS supports the input of text instructions and reference images with masks, boxes or scribbles as omni-prompts. This property makes it can well exploit the intrinsic merits of both text and visual modalities, i.e., granular attribute referring and uncommon object grounding, respectively. Besides, OmniRIS can also handle various segmentation settings, such as one v.s. many and many v.s. many, further facilitating its practical use. To promote the research of OmniRIS, we also rigorously design and construct a large dataset termed OmniRef, which consists of 186,939 omni-prompts for 30,956 images, and establish a comprehensive evaluation system. Moreover, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Handwritten Text Recognition Techniques