ProEdit: Inversion-based Editing From Prompts Done Right
Zhi Ouyang, Dian Zheng, Xiao-Ming Wu, Jian-Jian Jiang, Kun-Yu Lin, Jingke Meng, Wei-Shi Zheng

TL;DR
ProEdit introduces novel attention and latent techniques to improve inversion-based visual editing, achieving state-of-the-art results while maintaining background consistency and allowing seamless integration with existing methods.
Contribution
The paper proposes KV-mix and Latents-Shift to address source influence issues in inversion-based editing, enhancing editing accuracy and consistency.
Findings
Achieves state-of-the-art performance on image and video editing benchmarks.
The methods are plug-and-play and compatible with existing inversion techniques.
Significantly improves editing attribute changes without compromising background quality.
Abstract
Inversion-based visual editing provides an effective and training-free way to edit an image or a video based on user instructions. Existing methods typically inject source image information during the sampling process to maintain editing consistency. However, this sampling strategy overly relies on source information, which negatively affects the edits in the target image (e.g., failing to change the subject's atributes like pose, number, or color as instructed). In this work, we propose ProEdit to address this issue both in the attention and the latent aspects. In the attention aspect, we introduce KV-mix, which mixes KV features of the source and the target in the edited region, mitigating the influence of the source image on the editing region while maintaining background consistency. In the latent aspect, we propose Latents-Shift, which perturbs the edited region of the source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
