CLAP: Isolating Content from Style through Contrastive Learning with   Augmented Prompts

Yichao Cai; Yuhang Liu; Zhen Zhang; Javen Qinfeng Shi

arXiv:2311.16445·cs.CV·April 24, 2025·1 cites

CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts

Yichao Cai, Yuhang Liu, Zhen Zhang, Javen Qinfeng Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a contrastive learning approach with data augmentation to disentangle content from style in vision-language models, improving their generalization, robustness, and performance in zero-shot and few-shot tasks.

Contribution

It proposes a novel method to extract pure content features by integrating image and text augmentation into pre-trained CLIP-like models, enhancing their representation quality.

Findings

01

Significant improvements in zero-shot classification accuracy.

02

Enhanced robustness to data perturbations.

03

Better disentanglement of content and style features.

Abstract

Contrastive vision-language models, such as CLIP, have garnered considerable attention for various downstream tasks, mainly due to the remarkable ability of the learned features for generalization. However, the features they learned often blend content and style information, which somewhat limits their generalization capabilities under distribution shifts. To address this limitation, we adopt a causal generative perspective for multimodal data and propose contrastive learning with data augmentation to disentangle content features from the original representations. To achieve this, we begin with exploring image augmentation techniques and develop a method to seamlessly integrate them into pre-trained CLIP-like models to extract pure content features. Taking a step further, recognizing the inherent semantic richness and logical structure of text data, we explore the use of text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YichaoCai1/CLAP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Learning · Contrastive Language-Image Pre-training