USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin, Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, and Liu Ren

TL;DR
This paper introduces the Universal Segment Embedding (USE) framework, which enhances open-vocabulary image segmentation by accurately classifying segments into diverse text-defined categories, outperforming existing methods.
Contribution
The paper presents a novel USE framework with a data pipeline and universal segment embedding model for improved open-vocabulary segmentation and downstream tasks.
Findings
Outperforms state-of-the-art open-vocabulary segmentation methods
Effective in semantic and part segmentation benchmarks
Facilitates downstream tasks like querying and ranking
Abstract
The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories. In this paper, we introduce the Universal Segment Embedding (USE) framework to address this challenge. This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories. The USE model can not only help…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Radiomics and Machine Learning in Medical Imaging
MethodsMultilingual Universal Sentence Encoder
