Learning Invariant Causal Mechanism from Vision-Language Models
Zeen Song, Siyu Zhao, Xingyu Zhang, Jiangmeng Li, Changwen Zheng, Wenwen Qiang

TL;DR
This paper models CLIP's prediction process using causal inference, demonstrating how focusing on invariant causal factors can improve out-of-distribution robustness, and proposes a new framework called CLIP-ICM.
Contribution
The paper introduces CLIP-ICM, a causal-inference-based method that enhances CLIP's robustness by leveraging invariant causal mechanisms across environments.
Findings
CLIP-ICM significantly improves OOD performance of CLIP.
Theoretical proof of a linear mapping from CLIP embeddings to invariant factors.
Experimental validation on multiple OOD datasets shows robustness gains.
Abstract
Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, but its performance can degrade when fine-tuned in out-of-distribution (OOD) scenarios. We model the prediction process using a Structural Causal Model (SCM) and show that the causal mechanism involving both invariant and variant factors in training environments differs from that in test environments. In contrast, the causal mechanism with solely invariant factors remains consistent across environments. We theoretically prove the existence of a linear mapping from CLIP embeddings to invariant factors, which can be estimated using interventional data. Additionally, we provide a condition to guarantee low OOD risk of the invariant predictor. Based on these insights, we propose the Invariant Causal Mechanism of CLIP (CLIP-ICM) framework. CLIP-ICM involves collecting interventional data, estimating a linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsCausal inference · Contrastive Language-Image Pre-training
