ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation

Xiwei Xuan; Ziquan Deng; Kwan-Liu Ma

arXiv:2506.21233·cs.CV·June 30, 2025

ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation

Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma

PDF

Open Access 1 Repo

TL;DR

ReME introduces a data-centric framework that leverages high-quality reference sets and simple retrieval methods to significantly improve training-free open-vocabulary segmentation performance across multiple benchmarks.

Contribution

The paper highlights the importance of data quality in training-free OVS and proposes a framework that constructs high-quality reference sets for better segmentation results.

Findings

01

Outperforms existing training-free OVS methods on ten benchmarks.

02

Emphasizes data quality as a key factor for dense scene understanding.

03

Uses a simple similarity-based retrieval approach effectively.

Abstract

Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiweix/reme
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques

MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training