LCCo: Lending CLIP to Co-Segmentation

Xin Duan; Yan Yang; Liyuan Pan; Xiabi Liu

arXiv:2308.11506·cs.CV·August 23, 2023

LCCo: Lending CLIP to Co-Segmentation

Xin Duan, Yan Yang, Liyuan Pan, Xiabi Liu

PDF

Open Access

TL;DR

This paper introduces LCCo, a novel co-segmentation method leveraging CLIP's language-image pre-training to improve semantic object segmentation across image sets without requiring extra labeled data.

Contribution

LCCo integrates CLIP into a segmentation framework with three modules, enabling effective co-segmentation without relying on additional supervision or complex network engineering.

Findings

01

Outperforms state-of-the-art on four benchmarks

02

Effectively leverages CLIP for semantic consistency

03

Refines features in a coarse-to-fine manner

Abstract

This paper studies co-segmenting the common semantic object in a set of images. Existing works either rely on carefully engineered networks to mine the implicit semantic information in visual features or require extra data (i.e., classification labels) for training. In this paper, we leverage the contrastive language-image pre-training framework (CLIP) for the task. With a backbone segmentation network that independently processes each image from the set, we introduce semantics from CLIP into the backbone features, refining them in a coarse-to-fine manner with three key modules: i) an image set feature correspondence module, encoding global consistent semantic information of the image set; ii) a CLIP interaction module, using CLIP-mined common semantics of the image set to refine the backbone feature; iii) a CLIP regularization module, drawing CLIP towards this co-segmentation task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training