ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation
Xiwei Xuan, Ziquan Deng, Kwan-Liu Ma

TL;DR
ReME introduces a data-centric framework that leverages high-quality reference sets and simple retrieval methods to significantly improve training-free open-vocabulary segmentation performance across multiple benchmarks.
Contribution
The paper highlights the importance of data quality in training-free OVS and proposes a framework that constructs high-quality reference sets for better segmentation results.
Findings
Outperforms existing training-free OVS methods on ten benchmarks.
Emphasizes data quality as a key factor for dense scene understanding.
Uses a simple similarity-based retrieval approach effectively.
Abstract
Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsContrastive Language-Image Pre-training · Sparse Evolutionary Training
