Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion   Models

Wenkai Dong; Song Xue; Xiaoyue Duan; Shumin Han

arXiv:2305.04441·cs.CV·May 9, 2023·6 cites

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Wenkai Dong, Song Xue, Xiaoyue Duan, Shumin Han

PDF

Open Access

TL;DR

This paper introduces Prompt Tuning Inversion, a fast and accurate method for text-driven image editing with diffusion models that balances editability and fidelity, outperforming existing techniques.

Contribution

The paper proposes a novel inversion technique that encodes input images into learnable embeddings, enabling high-quality, user-friendly image editing guided solely by text prompts.

Findings

01

Outperforms state-of-the-art baselines in ImageNet editing tasks

02

Achieves a good balance between editability and image fidelity

03

Enables precise color and object modifications using only text prompts

Abstract

Recently large-scale language-image models (e.g., text-guided diffusion models) have considerably improved the image generation capabilities to generate photorealistic images in various domains. Based on this success, current image editing methods use texts to achieve intuitive and versatile modification of images. To edit a real image using diffusion models, one must first invert the image to a noisy latent from which an edited image is sampled with a target text prompt. However, most methods lack one of the following: user-friendliness (e.g., additional masks or precise descriptions of the input image are required), generalization to larger domains, or high fidelity to the input image. In this paper, we design an accurate and quick inversion technique, Prompt Tuning Inversion, for text-driven image editing. Specifically, our proposed editing method consists of a reconstruction stage…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsDiffusion