SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Joseph Tan, Muhammad Ferjad Naeem, Luc Van Gool,, Federico Tombari

TL;DR
SemiVL leverages vision-language pre-training and a novel language-guided decoder to enhance semi-supervised semantic segmentation, significantly improving accuracy by integrating rich semantic priors and local reasoning.
Contribution
The paper introduces a method that integrates vision-language models into semi-supervised segmentation with spatial fine-tuning and language guidance, advancing the state-of-the-art performance.
Findings
SemiVL outperforms previous methods on multiple datasets.
Achieves +13.5 mIoU improvement on COCO with limited labels.
Achieves +6.1 mIoU improvement on Pascal VOC with minimal annotations.
Abstract
In semi-supervised semantic segmentation, a model is trained with a limited number of labeled images along with a large corpus of unlabeled images to reduce the high annotation effort. While previous methods are able to learn good segmentation boundaries, they are prone to confuse classes with similar visual appearance due to the limited supervision. On the other hand, vision-language models (VLMs) are able to learn diverse semantic knowledge from image-caption datasets but produce noisy segmentation due to the image-level training. In SemiVL, we propose to integrate rich priors from VLM pre-training into semi-supervised semantic segmentation to learn better semantic decision boundaries. To adapt the VLM from global to local reasoning, we introduce a spatial fine-tuning strategy for label-efficient learning. Further, we design a language-guided decoder to jointly reason over vision and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Natural Language Processing Techniques
