What do we learn from inverting CLIP models?

Hamid Kazemi; Atoosa Chegini; Jonas Geiping; Soheil Feizi; Tom; Goldstein

arXiv:2403.02580·cs.CV·March 6, 2024·1 cites

What do we learn from inverting CLIP models?

Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom, Goldstein

PDF

Open Access 1 Repo 4 Reviews

TL;DR

This paper investigates CLIP models through inversion, revealing their ability to generate semantically aligned images and exposing biases, including NSFW content, even with benign prompts.

Contribution

It introduces an inversion-based method to analyze CLIP models, providing new insights into their semantic capabilities and biases.

Findings

01

Inverted images align semantically with prompts.

02

CLIP models can blend concepts and exhibit biases.

03

NSFW images can appear even with innocuous prompts.

Abstract

We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe instances of NSFW (Not Safe For Work) images during model inversion. This phenomenon occurs even for semantically innocuous prompts, like "a beautiful landscape," as well as for prompts involving the names of celebrities.

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 4

Strengths

Strengths: - The analysis shows that CLIP models trained on more data are more amenable to image inversion (better quality of inverted images

Weaknesses

Weaknesses: - No novelty in methodology or findings. The problem of image inversion has been studied in the context of discriminative (deepdream and papers cited in the submission) and generative models (e.g., https://arxiv.org/abs/2405.15012, https://dl.acm.org/doi/abs/10.1145/3372297.3417270). The problem of identifying CLIP biases has also been studied in the past (e.g., https://arxiv.org/abs/2311.05746, https://ojs.aaai.org/index.php/AIES/article/view/31657 and references therein). - Fairly

Reviewer 02Rating 5Confidence 3

Strengths

This work presents an interesting analysis on interpreting embedding-based image features from the widely popular CLIP model (where the information about training data is proprietary). It flags important drawbacks elicited via model inversion which point out potential flaws in the training data, flagging an important issue given that CLIP image embeddings are widely used. Further, the authors have aptly presented the different implications in an organized and coherent fashion making it easy to f

Weaknesses

While the work presents an interesting analysis, it is unclear how these insights can be concretely leveraged to improve image generation pipelines as of today. Can we inform any of the following? a) Modelling strategies and making models more robust to such potentially bad data points? Any kind of safety finetuning? b) data curation strategies if any? Further, some modelling choices (e.g. choice of transformations) are not well motivated. These questions have been outlined in the next section.

Reviewer 03Rating 6Confidence 4

Strengths

The paper presents a solid analysis of CLIP models through a novel approach using model inversion. The paper is well written and the motivation is clear. The problematic studied in the paper is timely and will be of good use to the community. To my knowledge, the idea is indeed novel. Specifically, the authors clearly demonstrate that CLIP models possess the capability to blend concepts, akin to generative models like DALLE and IMAGEN. Their study reveals associations between seemingly harmless

Weaknesses

Overall, the paper appears to be poorly formatted, giving the impression of being rushed without proper attention to formatting guidelines. For example, Table 4 is misaligned and requires reformatting, and there are 10 unnecessary empty lines between Figure 2 and the text. The same issue occurs with Table 6 in the appendix. This is disappointing, as the text within the paper is well-written and addresses an important topic. Additionally, while the experiments in Table 4 seem convincing, I belie

Reviewer 04Rating 3Confidence 4

Strengths

- Applying model inversion to CLIP models is a suitable way to obtain insights into its proprietary and unavailable training data. - The paper reveals that the CLIP training data was not cleaned from potentially harmful content. - It is interesting to see that CLIP model inversion can produce (somewhat) coherent objects and text

Weaknesses

- While CLIP model inversion is interesting for the sake of scientific curiosity, the paper does not discuss any practical implications on downstream tasks, such as retrieval, classification, segmentation, text-to-image modeling. Sec. 7 (l. 470) even states that "these behaviors do not have to be represented in other operational modes".) - The paper makes rather strong claims which are mostly supported by few qualitative examples. Furthermore, the experimental analysis is not thorough enough. Th

Code & Models

Repositories

hamidkazemi22/clipinversion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Global Financial Regulation and Crises · Economic Policies and Impacts

MethodsContrastive Language-Image Pre-training