When and How Does CLIP Enable Domain and Compositional Generalization?

Elias Kempf; Simon Schrodi; Max Argus; Thomas Brox

arXiv:2502.09507·cs.LG·September 15, 2025

When and How Does CLIP Enable Domain and Compositional Generalization?

Elias Kempf, Simon Schrodi, Max Argus, Thomas Brox

PDF

Open Access 1 Video

TL;DR

This paper investigates how CLIP's ability to generalize to new domains and compositions depends on training data diversity, revealing that shared representations are key for effective generalization.

Contribution

It systematically studies the impact of domain diversity on CLIP's generalization, providing insights into the factors that enable domain and compositional generalization.

Findings

01

Domain diversity is crucial for generalization.

02

Compositional generalization can be weaker than domain generalization.

03

Shared intermediate representations are essential for successful generalization.

Abstract

The remarkable generalization performance of contrastive vision-language models like CLIP is often attributed to the diversity of their training distributions. However, key questions remain unanswered: Can CLIP generalize to an entirely unseen domain when trained on a diverse mixture of domains (domain generalization)? Can it generalize to unseen classes within partially seen domains (compositional generalization)? What factors affect such generalization? To answer these questions, we trained CLIP models on systematically constructed training distributions with controlled domain diversity and object class exposure. Our experiments show that domain diversity is essential for both domain and compositional generalization, yet compositional generalization can be surprisingly weaker than domain generalization when the training distribution contains a suboptimal subset of the test domain.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

When and How Does CLIP Enable Domain and Compositional Generalization?· slideslive

Taxonomy

TopicsFuzzy Logic and Control Systems · Advanced Algebra and Logic · Rough Sets and Fuzzy Logic

MethodsContrastive Language-Image Pre-training