Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu; SongLi Wu; Sule Bai; Jiahao Wang; Yitong Wang; Yansong Tang

arXiv:2506.16058·cs.CV·June 25, 2025

Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation

Yong Liu, SongLi Wu, Sule Bai, Jiahao Wang, Yitong Wang, Yansong Tang

PDF

Open Access

TL;DR

This paper introduces OpenBench, a new benchmark for open-vocabulary segmentation that better evaluates models' understanding of diverse concepts, and proposes OVSNet, a method that achieves state-of-the-art results on this benchmark and existing datasets.

Contribution

The paper presents a novel benchmark OpenBench for more rigorous evaluation of open-vocabulary segmentation models and introduces OVSNet, a new method that enhances segmentation performance across diverse scenarios.

Findings

01

Existing benchmarks are limited in measuring true open-vocabulary understanding.

02

Models perform worse on OpenBench compared to traditional test sets.

03

OVSNet achieves state-of-the-art results on both existing datasets and OpenBench.

Abstract

Open-vocabulary segmentation aims to achieve segmentation of arbitrary categories given unlimited text inputs as guidance. To achieve this, recent works have focused on developing various technical routes to exploit the potential of large-scale pre-trained vision-language models and have made significant progress on existing benchmarks. However, we find that existing test sets are limited in measuring the models' comprehension of ``open-vocabulary" concepts, as their semantic space closely resembles the training space, even with many overlapping categories. To this end, we present a new benchmark named OpenBench that differs significantly from the training semantics. It is designed to better assess the model's ability to understand and segment a wide range of real-world concepts. When testing existing methods on OpenBench, we find that their performance diverges from the conclusions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques