Open-Vocabulary Segmentation with Semantic-Assisted Calibration

Yong Liu; Sule Bai; Guanbin Li; Yitong Wang; Yansong Tang

arXiv:2312.04089·cs.CV·November 27, 2024·2 cites

Open-Vocabulary Segmentation with Semantic-Assisted Calibration

Yong Liu, Sule Bai, Guanbin Li, Yitong Wang, Yansong Tang

PDF

Open Access 2 Repos

TL;DR

This paper introduces SCAN, a novel calibration network that leverages CLIP's semantic prior and contextual shift to improve open-vocabulary segmentation, achieving state-of-the-art results and proposing a new evaluation metric.

Contribution

The paper proposes SCAN, a semantic-assisted calibration network that enhances open-vocabulary segmentation by integrating CLIP's semantic prior and a contextual shift strategy.

Findings

01

SCAN achieves state-of-the-art performance on open-vocabulary segmentation benchmarks.

02

The proposed SG-IoU metric better evaluates semantic duplication across categories.

03

Incorporating semantic prior and contextual shift improves segmentation accuracy.

Abstract

This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with generalized contextual prior of CLIP. As the core of open-vocabulary understanding, alignment of visual content with the semantics of unbounded text has become the bottleneck of this field. To address this challenge, recent works propose to utilize CLIP as an additional classifier and aggregate model predictions with CLIP classification results. Despite their remarkable progress, performance of OVS methods in relevant scenarios is still unsatisfactory compared with supervised counterparts. We attribute this to the in-vocabulary embedding and domain-biased CLIP prediction. To this end, we present a Semantic-assisted CAlibration Network (SCAN). In SCAN, we incorporate generalized semantic prior of CLIP into proposal embedding to avoid collapsing on known…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Natural Language Processing Techniques

MethodsFocus · Contrastive Language-Image Pre-training