OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning

Zongyan Han; Jiale Cao; Shuo Chen; Tong Wang; Jorma Laaksonen; Rao Muhammad Anwer

arXiv:2505.16974·cs.CV·August 4, 2025

OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning

Zongyan Han, Jiale Cao, Shuo Chen, Tong Wang, Jorma Laaksonen, Rao Muhammad Anwer

PDF

Open Access 1 Repo

TL;DR

OpenSeg-R introduces a novel step-by-step visual reasoning framework using Large Multimodal Models to enhance open-vocabulary segmentation, significantly improving accuracy and interpretability over existing methods.

Contribution

This paper presents the first explicit step-by-step visual reasoning approach for open-vocabulary segmentation, leveraging hierarchical reasoning to improve segmentation accuracy and interpretability.

Findings

01

Outperforms state-of-the-art on five benchmark datasets

02

Achieves consistent gains in open-vocabulary panoptic segmentation

03

Enhances segmentation precision and interpretability

Abstract

Open-Vocabulary Segmentation (OVS) has drawn increasing attention for its capacity to generalize segmentation beyond predefined categories. However, existing methods typically predict segmentation masks with simple forward inference, lacking explicit reasoning and interpretability. This makes it challenging for OVS model to distinguish similar categories in open-world settings due to the lack of contextual understanding and discriminative visual cues. To address this limitation, we propose a step-by-step visual reasoning framework for open-vocabulary segmentation, named OpenSeg-R. The proposed OpenSeg-R leverages Large Multimodal Models (LMMs) to perform hierarchical visual reasoning before segmentation. Specifically, we generate both generic and image-specific reasoning for each image, forming structured triplets that explain the visual reason for objects in a coarse-to-fine manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hanzy1996/openseg-r
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsSoftmax · Attention Is All You Need