Open-world Semantic Segmentation via Contrasting and Clustering   Vision-Language Embedding

Quande Liu; Youpeng Wen; Jianhua Han; Chunjing Xu; Hang Xu; Xiaodan; Liang

arXiv:2207.08455·cs.CV·November 1, 2022·1 cites

Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding

Quande Liu, Youpeng Wen, Jianhua Han, Chunjing Xu, Hang Xu, Xiaodan, Liang

PDF

Open Access

TL;DR

This paper introduces ViL-Seg, a novel open-world semantic segmentation approach that leverages image-caption data and contrastive learning to segment arbitrary categories without dense annotations.

Contribution

It pioneers a segmentation pipeline that learns from image-caption data alone, eliminating the need for dense annotations and enabling recognition of arbitrary open-world categories.

Findings

01

Outperforms existing zero-shot segmentation methods on three benchmarks.

02

Successfully segments objects of arbitrary categories without dense annotations.

03

Uses contrastive learning and online clustering to enhance segmentation quality.

Abstract

To bridge the gap between supervised semantic segmentation and real-world applications that acquires one model to recognize arbitrary new concepts, recent zero-shot segmentation attracts a lot of attention by exploring the relationships between unseen and seen object categories, yet requiring large amounts of densely-annotated data with diverse base classes. In this paper, we propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations, by purely exploiting the image-caption data that naturally exist on the Internet. Our method, Vision-language-driven Semantic Segmentation (ViL-Seg), employs an image and a text encoder to generate visual and text embeddings for the image-caption data, with two core components that endow its segmentation ability: First,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI

MethodsBalanced Selection