pOps: Photo-Inspired Diffusion Operators

Elad Richardson; Yuval Alaluf; Ali Mahdavi-Amiri; Daniel Cohen-Or

arXiv:2406.01300·cs.CV·June 4, 2024

pOps: Photo-Inspired Diffusion Operators

Elad Richardson, Yuval Alaluf, Ali Mahdavi-Amiri, Daniel Cohen-Or

PDF

Open Access

TL;DR

pOps introduces a framework that trains semantic operators directly on CLIP image embeddings using diffusion models, enabling more effective and diverse visual concept manipulations beyond language limitations.

Contribution

It presents a novel method to learn semantic image operators via diffusion models on CLIP embeddings, enhancing visual concept control in image generation.

Findings

01

Operators can be trained to perform diverse semantic manipulations.

02

Using diffusion priors improves learning of meaningful image transformations.

03

The approach enables direct supervision with textual CLIP loss.

Abstract

Text-guided image generation enables the creation of visual content from textual descriptions. However, certain visual concepts cannot be effectively conveyed through language alone. This has sparked a renewed interest in utilizing the CLIP image embedding space for more visually-oriented tasks through methods such as IP-Adapter. Interestingly, the CLIP image embedding space has been shown to be semantically meaningful, where linear operations within this space yield semantically meaningful results. Yet, the specific meaning of these operations can vary unpredictably across different images. To harness this potential, we introduce pOps, a framework that trains specific semantic operators directly on CLIP image embeddings. Each pOps operator is built upon a pretrained Diffusion Prior model. While the Diffusion Prior model was originally trained to map between text embeddings and image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Mathematical Modeling in Engineering

MethodsContrastive Language-Image Pre-training · Diffusion