PRedItOR: Text Guided Image Editing with Diffusion Prior

Hareesh Ravi; Sachin Kelkar; Midhun Harikumar; Ajinkya Kale

arXiv:2302.07979·cs.CV·March 22, 2023·1 cites

PRedItOR: Text Guided Image Editing with Diffusion Prior

Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale

PDF

Open Access

TL;DR

PRedItOR introduces a novel text-guided image editing method using a hybrid diffusion model that avoids fine-tuning or optimization, achieving high-quality, structure-preserving edits efficiently.

Contribution

It presents a diffusion prior model for conceptual text-guided image editing without fine-tuning, enhancing efficiency and flexibility over existing methods.

Findings

01

Achieves comparable or better results than baselines.

02

Does not require fine-tuning or optimization.

03

Enables structure-preserving edits using diffusion prior.

Abstract

Diffusion models have shown remarkable capabilities in generating high quality and creative images conditioned on text. An interesting application of such models is structure preserving text guided image editing. Existing approaches rely on text conditioned diffusion models such as Stable Diffusion or Imagen and require compute intensive optimization of text embeddings or fine-tuning the model weights for text guided image editing. We explore text guided image editing with a Hybrid Diffusion Model (HDM) architecture similar to DALLE-2. Our architecture consists of a diffusion prior model that generates CLIP image embedding conditioned on a text prompt and a custom Latent Diffusion Model trained to generate images conditioned on CLIP image embedding. We discover that the diffusion prior model can be used to perform text guided conceptual edits on the CLIP image embedding space without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis

MethodsDiffusion · Contrastive Language-Image Pre-training · Latent Diffusion Model