Rewrite Caption Semantics: Bridging Semantic Gaps for   Language-Supervised Semantic Segmentation

Yun Xing; Jian Kang; Aoran Xiao; Jiahao Nie; Ling Shao; Shijian Lu

arXiv:2309.13505·cs.CV·January 5, 2024·5 cites

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Yun Xing, Jian Kang, Aoran Xiao, Jiahao Nie, Ling Shao, Shijian Lu

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces Concept Curation (CoCu), a method that uses CLIP to bridge semantic gaps between visual concepts and textual descriptions, significantly improving zero-shot language-supervised semantic segmentation performance.

Contribution

The paper proposes CoCu, a novel pipeline that enhances language-supervised segmentation by compensating missing semantics using CLIP and concept curation techniques.

Findings

01

Achieves state-of-the-art zero-shot segmentation performance

02

Significantly boosts baseline results across 8 benchmarks

03

Demonstrates the effectiveness of bridging semantic gaps in pre-training

Abstract

Vision-Language Pre-training has demonstrated its remarkable zero-shot recognition ability and potential to learn generalizable visual representations from language supervision. Taking a step ahead, language-supervised semantic segmentation enables spatial localization of textual inputs by learning pixel grouping solely from image-text pairs. Nevertheless, the state-of-the-art suffers from clear semantic gaps between visual and textual modality: plenty of visual concepts appeared in images are missing in their paired captions. Such semantic misalignment circulates in pre-training, leading to inferior zero-shot performance in dense predictions due to insufficient visual concepts captured in textual representations. To close such semantic gap, we propose Concept Curation (CoCu), a pipeline that leverages CLIP to compensate for the missing semantics. For each image-text pair, we establish…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentation· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsContrastive Language-Image Pre-training