What do we learn from inverting CLIP models?
Hamid Kazemi, Atoosa Chegini, Jonas Geiping, Soheil Feizi, Tom, Goldstein

TL;DR
This paper investigates CLIP models through inversion, revealing their ability to generate semantically aligned images and exposing biases, including NSFW content, even with benign prompts.
Contribution
It introduces an inversion-based method to analyze CLIP models, providing new insights into their semantic capabilities and biases.
Findings
Inverted images align semantically with prompts.
CLIP models can blend concepts and exhibit biases.
NSFW images can appear even with innocuous prompts.
Abstract
We employ an inversion-based approach to examine CLIP models. Our examination reveals that inverting CLIP models results in the generation of images that exhibit semantic alignment with the specified target prompts. We leverage these inverted images to gain insights into various aspects of CLIP models, such as their ability to blend concepts and inclusion of gender biases. We notably observe instances of NSFW (Not Safe For Work) images during model inversion. This phenomenon occurs even for semantically innocuous prompts, like "a beautiful landscape," as well as for prompts involving the names of celebrities.
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Strengths: - The analysis shows that CLIP models trained on more data are more amenable to image inversion (better quality of inverted images
Weaknesses: - No novelty in methodology or findings. The problem of image inversion has been studied in the context of discriminative (deepdream and papers cited in the submission) and generative models (e.g., https://arxiv.org/abs/2405.15012, https://dl.acm.org/doi/abs/10.1145/3372297.3417270). The problem of identifying CLIP biases has also been studied in the past (e.g., https://arxiv.org/abs/2311.05746, https://ojs.aaai.org/index.php/AIES/article/view/31657 and references therein). - Fairly
This work presents an interesting analysis on interpreting embedding-based image features from the widely popular CLIP model (where the information about training data is proprietary). It flags important drawbacks elicited via model inversion which point out potential flaws in the training data, flagging an important issue given that CLIP image embeddings are widely used. Further, the authors have aptly presented the different implications in an organized and coherent fashion making it easy to f
While the work presents an interesting analysis, it is unclear how these insights can be concretely leveraged to improve image generation pipelines as of today. Can we inform any of the following? a) Modelling strategies and making models more robust to such potentially bad data points? Any kind of safety finetuning? b) data curation strategies if any? Further, some modelling choices (e.g. choice of transformations) are not well motivated. These questions have been outlined in the next section.
The paper presents a solid analysis of CLIP models through a novel approach using model inversion. The paper is well written and the motivation is clear. The problematic studied in the paper is timely and will be of good use to the community. To my knowledge, the idea is indeed novel. Specifically, the authors clearly demonstrate that CLIP models possess the capability to blend concepts, akin to generative models like DALLE and IMAGEN. Their study reveals associations between seemingly harmless
Overall, the paper appears to be poorly formatted, giving the impression of being rushed without proper attention to formatting guidelines. For example, Table 4 is misaligned and requires reformatting, and there are 10 unnecessary empty lines between Figure 2 and the text. The same issue occurs with Table 6 in the appendix. This is disappointing, as the text within the paper is well-written and addresses an important topic. Additionally, while the experiments in Table 4 seem convincing, I belie
- Applying model inversion to CLIP models is a suitable way to obtain insights into its proprietary and unavailable training data. - The paper reveals that the CLIP training data was not cleaned from potentially harmful content. - It is interesting to see that CLIP model inversion can produce (somewhat) coherent objects and text
- While CLIP model inversion is interesting for the sake of scientific curiosity, the paper does not discuss any practical implications on downstream tasks, such as retrieval, classification, segmentation, text-to-image modeling. Sec. 7 (l. 470) even states that "these behaviors do not have to be represented in other operational modes".) - The paper makes rather strong claims which are mostly supported by few qualitative examples. Furthermore, the experimental analysis is not thorough enough. Th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Global Financial Regulation and Crises · Economic Policies and Impacts
MethodsContrastive Language-Image Pre-training
