Textual and Visual Prompt Fusion for Image Editing via Step-Wise   Alignment

Zhanbo Feng; Zenan Ling; Xinyu Lu; Ci Gong; Feng Zhou; Wugedele Bao,; Jie Li; Fan Yang; Robert C. Qiu

arXiv:2308.15854·cs.CV·January 7, 2025

Textual and Visual Prompt Fusion for Image Editing via Step-Wise Alignment

Zhanbo Feng, Zenan Ling, Xinyu Lu, Ci Gong, Feng Zhou, Wugedele Bao,, Jie Li, Fan Yang, Robert C. Qiu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel image editing framework that fuses visual references and text guidance within a pre-trained diffusion model, achieving high-quality, semantically consistent edits with intuitive control.

Contribution

It presents a new fusion approach that integrates visual and textual prompts into a frozen diffusion model using minimal neural network components, enhancing control and image quality.

Findings

01

Produces higher quality images than state-of-the-art methods

02

Ensures semantic consistency and realistic editing effects

03

Works effectively across various benchmark datasets

Abstract

The use of denoising diffusion models is becoming increasingly popular in the field of image editing. However, current approaches often rely on either image-guided methods, which provide a visual reference but lack control over semantic consistency, or text-guided methods, which ensure alignment with the text guidance but compromise visual quality. To resolve this issue, we propose a framework that integrates a fusion of generated visual references and text guidance into the semantic latent space of a \textit{frozen} pre-trained diffusion model. Using only a tiny neural network, our framework provides control over diverse content and attributes, driven intuitively by the text prompt. Compared to state-of-the-art methods, the framework generates images of higher quality while providing realistic editing effects across various benchmark datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sadangelf/editing-via-step-wise-alignment
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Image Processing Techniques

MethodsDiffusion