DCP-CLIP:A Coarse-to-Fine Framework for Open-Vocabulary Semantic Segmentation with Dual Interaction

Jing Wang; Huimin Shi; Quan Zhou; Qibo Liu; Suofei Zhang; Huimin Lu

arXiv:2603.13951·cs.CV·March 17, 2026

DCP-CLIP:A Coarse-to-Fine Framework for Open-Vocabulary Semantic Segmentation with Dual Interaction

Jing Wang, Huimin Shi, Quan Zhou, Qibo Liu, Suofei Zhang, Huimin Lu

PDF

Open Access

TL;DR

DCP-CLIP introduces a coarse-to-fine framework for open-vocabulary semantic segmentation that dynamically constructs textual features and models dual interactions, improving accuracy and efficiency over existing methods.

Contribution

The paper proposes a novel dynamic textual feature construction and dual interaction modeling framework for OVSS, addressing cross-modal communication and computational efficiency issues.

Findings

01

Outperforms existing OVSS methods in accuracy

02

Achieves higher efficiency in semantic segmentation

03

Demonstrates effectiveness on multiple benchmarks

Abstract

The recent years have witnessed the remarkable development for open-vocabulary semantic segmentation (OVSS) using visual-language foundation models, yet still suffer from following fundamental challenges: (1) insufficient cross-modal communications between textual and visual spaces, and (2) significant computational costs from the interactions with massive number of categories. To address these issues, this paper describes a novel coarse-to-fine framework, called DCP-CLIP, for OVSS. Unlike prior efforts that mainly relied on pre-established category content and the inherent spatial-class interaction capability of CLIP, we dynamic constructing category-relevant textual features and explicitly models dual interactions between spatial image features and textual class semantics. Specifically, we first leverage CLIP's open-vocabulary recognition capability to identify semantic categories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning