Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation

Jialei Chen; Xu Zheng; Danda Pani Paudel; Luc Van Gool; Hiroshi Murase; Daisuke Deguchi

arXiv:2506.22032·cs.CV·June 30, 2025

Partial CLIP is Enough: Chimera-Seg for Zero-shot Semantic Segmentation

Jialei Chen, Xu Zheng, Danda Pani Paudel, Luc Van Gool, Hiroshi Murase, Daisuke Deguchi

PDF

Open Access

TL;DR

This paper introduces Chimera-Seg, a novel zero-shot semantic segmentation model that combines a segmentation backbone with a CLIP-based semantic head, addressing alignment challenges and achieving improved performance.

Contribution

Chimera-Seg integrates a segmentation model with a CLIP-based semantic head and proposes Selective Global Distillation for better alignment in zero-shot segmentation.

Findings

01

Achieves 0.9% and 1.2% improvements in hIoU on two benchmarks.

02

Effectively aligns dense visual features with CLIP's semantic space.

03

Demonstrates the effectiveness of partial CLIP modules in segmentation.

Abstract

Zero-shot Semantic Segmentation (ZSS) aims to segment both seen and unseen classes using supervision from only seen classes. Beyond adaptation-based methods, distillation-based approaches transfer vision-language alignment of vision-language model, e.g., CLIP, to segmentation models. However, such knowledge transfer remains challenging due to: (1) the difficulty of aligning vision-based features with the textual space, which requires combining spatial precision with vision-language alignment; and (2) the semantic gap between CLIP's global representations and the local, fine-grained features of segmentation models. To address challenge (1), we propose Chimera-Seg, which integrates a segmentation backbone as the body and a CLIP-based semantic head as the head, like the Chimera in Greek mythology, combining spatial precision with vision-language alignment. Specifically, Chimera-Seg…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling