Understanding the Robustness of Multi-modal Contrastive Learning to   Distribution Shift

Yihao Xue; Siddharth Joshi; Dang Nguyen; Baharan Mirzasoleiman

arXiv:2310.04971·cs.LG·March 19, 2024·2 cites

Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman

PDF

Open Access

TL;DR

This paper analyzes why multimodal contrastive learning methods like CLIP are robust to distribution shifts, revealing mechanisms such as intra-class contrasting and inter-class feature sharing that enhance generalization.

Contribution

It uncovers the mechanisms behind MMCL's robustness, provides theoretical insights on the role of rich captions, and validates findings through synthetic and real-world experiments.

Findings

01

Intra-class contrasting learns high-variance features.

02

Inter-class feature sharing improves generalization.

03

Rich captions enhance robustness to distribution shifts.

Abstract

Recently, multimodal contrastive learning (MMCL) approaches, such as CLIP, have achieved a remarkable success in learning representations that are robust against distribution shift and generalize to new domains. Despite the empirical success, the mechanism behind learning such generalizable representations is not understood. In this work, we rigorously analyze this problem and uncover two mechanisms behind MMCL's robustness: \emph{intra-class contrasting}, which allows the model to learn features with a high variance, and \emph{inter-class feature sharing}, where annotated details in one class help learning other classes better. Both mechanisms prevent spurious features that are over-represented in the training data to overshadow the generalizable core features. This yields superior zero-shot classification accuracy under distribution shift. Furthermore, we theoretically demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research

MethodsContrastive Learning · Contrastive Language-Image Pre-training