KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image   Action Editing

Jiancheng Huang; Yifan Liu; Jin Qin; Shifeng Chen

arXiv:2309.16608·cs.CV·September 29, 2023·1 cites

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

Jiancheng Huang, Yifan Liu, Jin Qin, Shifeng Chen

PDF

Open Access

TL;DR

KV Inversion introduces a novel method for text-conditioned real image editing that ensures action semantics are followed and original content is preserved without retraining large diffusion models.

Contribution

The paper presents KV Inversion, a new approach enabling effective action editing in real images without retraining the underlying diffusion model or extensive dataset scanning.

Findings

01

Achieves high-quality action editing matching prompts.

02

Preserves original image texture and identity.

03

Does not require retraining of Stable Diffusion models.

Abstract

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition

MethodsDiffusion