Deciphering the Role of Representation Disentanglement: Investigating   Compositional Generalization in CLIP Models

Reza Abbasi; Mohammad Hossein Rohban; Mahdieh Soleymani Baghshah

arXiv:2407.05897·cs.CV·July 17, 2024

Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models

Reza Abbasi, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah

PDF

Open Access 1 Repo

TL;DR

This paper investigates how representation disentanglement affects compositional generalization in CLIP models, using a carefully synthesized dataset to evaluate true out-of-distribution performance and identify key factors for improvement.

Contribution

It introduces a novel dataset for authentic C-OoD evaluation and demonstrates that disentangled representations are crucial for CLIP's compositional generalization.

Findings

01

Disentanglement correlates with better C-OoD performance

02

Varying C-OoD generalization observed across CLIP models

03

Disentanglement metrics can predict generalization capabilities

Abstract

CLIP models have recently shown to exhibit Out of Distribution (OoD) generalization capabilities. However, Compositional Out of Distribution (C-OoD) generalization, which is a crucial aspect of a model's ability to understand unseen compositions of known concepts, is relatively unexplored for the CLIP models. Our goal is to address this problem and identify the factors that contribute to the C-OoD in CLIPs. We noted that previous studies regarding compositional understanding of CLIPs frequently fail to ensure that test samples are genuinely novel relative to the CLIP training data. To this end, we carefully synthesized a large and diverse dataset in the single object setting, comprising attributes for objects that are highly unlikely to be encountered in the combined training datasets of various CLIP models. This dataset enables an authentic evaluation of C-OoD generalization. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abbasiReza/CLIP-COoD
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsItaly: Economic History and Contemporary Issues

MethodsContrastive Language-Image Pre-training