pOps: Photo-Inspired Diffusion Operators
Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or

TL;DR
pOps introduces a framework that trains semantic operators directly on CLIP image embeddings using diffusion models, enabling more effective and diverse visual concept manipulations beyond language limitations.
Contribution
It presents a novel method to learn semantic image operators via diffusion models on CLIP embeddings, enhancing visual concept control in image generation.
Findings
Operators can be trained to perform diverse semantic manipulations.
Using diffusion priors improves learning of meaningful image transformations.
The approach enables direct supervision with textual CLIP loss.
Abstract
Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Mathematical Modeling in Engineering
MethodsContrastive Language-Image Pre-training · Diffusion
