Auto-Vocabulary Semantic Segmentation

Osman \"Ulger; Maksymilian Kulicki; Yuki Asano; Martin R. Oswald

arXiv:2312.04539·cs.CV·March 13, 2025·2 cites

Auto-Vocabulary Semantic Segmentation

Osman \"Ulger, Maksymilian Kulicki, Yuki Asano, Martin R. Oswald

PDF

Open Access 1 Repo

TL;DR

Auto-Vocabulary Semantic Segmentation (AVS) autonomously identifies and segments relevant object classes in images without predefined categories, advancing open-ended image understanding and outperforming previous methods on multiple datasets.

Contribution

Introduces AVS, a framework that automatically generates object categories and segments them, removing the need for predefined vocabularies in semantic segmentation.

Findings

01

Sets new benchmarks on PASCAL VOC, Context, ADE20K, and Cityscapes datasets.

02

Achieves competitive performance with methods requiring predefined class names.

03

Develops LAVE for evaluating automatically generated classes and segments.

Abstract

Open-Vocabulary Segmentation (OVS) methods are capable of performing semantic segmentation without relying on a fixed vocabulary, and in some cases, without training or fine-tuning. However, OVS methods typically require a human in the loop to specify the vocabulary based on the task or dataset at hand. In this paper, we introduce Auto-Vocabulary Semantic Segmentation (AVS), advancing open-ended image understanding by eliminating the necessity to predefine object categories for segmentation. Our approach, AutoSeg, presents a framework that autonomously identifies relevant class names using semantically enhanced BLIP embeddings and segments them afterwards. Given that open-ended object category predictions cannot be directly compared with a fixed ground truth, we develop a Large Language Model-based Auto-Vocabulary Evaluator (LAVE) to efficiently evaluate the automatically generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ozzyou/autoseg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsBLIP: Bootstrapping Language-Image Pre-training