SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language   Guidance

Lukas Hoyer; David Joseph Tan; Muhammad Ferjad Naeem; Luc Van Gool,; Federico Tombari

arXiv:2311.16241·cs.CV·November 29, 2023·2 cites

SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance

Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool,, Federico Tombari

PDF

Open Access 1 Repo

TL;DR

SemiVL leverages vision-language pre-training and a novel language-guided decoder to enhance semi-supervised semantic segmentation, significantly improving accuracy by integrating rich semantic priors and local reasoning.

Contribution

The paper introduces a method that integrates vision-language models into semi-supervised segmentation with spatial fine-tuning and language guidance, advancing the state-of-the-art performance.

Findings

01

SemiVL outperforms previous methods on multiple datasets.

02

Achieves +13.5 mIoU improvement on COCO with limited labels.

03

Achieves +6.1 mIoU improvement on Pascal VOC with minimal annotations.

Abstract

In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs) are able to learn diverse semantic knowledge from image-caption datasets but produce noisy segmentation due to the image-level training. In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries. To adapt the VLM from global to local reasoning, we introduce a spatial fine-tuning strategy for label-efficient learning. Further, we design a language-guided decoder to jointly reason over vision and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/semivl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques