Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu, Sule Bai, Guanbin Li, Yitong Wang, Yansong Tang

TL;DR
This paper introduces SCAN, a novel calibration network that leverages CLIP's semantic prior and contextual shift to improve open-vocabulary segmentation, achieving state-of-the-art results and proposing a new evaluation metric.
Contribution
The paper proposes SCAN, a semantic-assisted calibration network that enhances open-vocabulary segmentation by integrating CLIP's semantic prior and a contextual shift strategy.
Findings
SCAN achieves state-of-the-art performance on open-vocabulary segmentation benchmarks.
The proposed SG-IoU metric better evaluates semantic duplication across categories.
Incorporating semantic prior and contextual shift improves segmentation accuracy.
Abstract
This paper studies open-vocabulary segmentation (OVS) through calibrating in-vocabulary and domain-biased embedding space with generalized contextual prior of CLIP. As the core of open-vocabulary understanding, alignment of visual content with the semantics of unbounded text has become the bottleneck of this field. To address this challenge, recent works propose to utilize CLIP as an additional classifier and aggregate model predictions with CLIP classification results. Despite their remarkable progress, performance of OVS methods in relevant scenarios is still unsatisfactory compared with supervised counterparts. We attribute this to the in-vocabulary embedding and domain-biased CLIP prediction. To this end, we present a Semantic-assisted CAlibration Network (SCAN). In SCAN, we incorporate generalized semantic prior of CLIP into proposal embedding to avoid collapsing on known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Natural Language Processing Techniques
MethodsFocus · Contrastive Language-Image Pre-training
