Textualize Visual Prompt for Image Editing via Diffusion Bridge

Pengcheng Xu; Qingnan Fan; Fei Kou; Shuai Qin; Hong Gu; Ruoyu Zhao,; Charles Ling; Boyu Wang

arXiv:2501.03495·cs.CV·January 28, 2025

Textualize Visual Prompt for Image Editing via Diffusion Bridge

Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao,, Charles Ling, Boyu Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel diffusion bridge framework that converts visual prompts into text embeddings for image editing, eliminating the need for retraining and enhancing scalability and generalization.

Contribution

It proposes a diffusion-based method that textualizes visual prompts using a single text-to-image model, avoiding explicit image-to-image models and retraining.

Findings

01

Achieves high fidelity and contextual coherence in image editing.

02

Demonstrates strong generalization with just one image pair as prompt.

03

Outperforms existing methods in delicate editing tasks.

Abstract

Visual prompt, a pair of before-and-after edited images, can convey indescribable imagery transformations and prosper in image editing. However, current visual prompt methods rely on a pretrained text-guided image-to-image generative model that requires a triplet of text, before, and after images for retraining over a text-to-image model. Such crafting triplets and retraining processes limit the scalability and generalization of editing. In this paper, we present a framework based on any single text-to-image model without reliance on the explicit image-to-image model thus enhancing the generalizability and scalability. Specifically, by leveraging the probability-flow ordinary equation, we construct a diffusion bridge to transfer the distribution between before-and-after images under the text guidance. By optimizing the text via the bridge, the framework adaptively textualizes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Textualize Visual Prompt for Image Editing via Diffusion Bridge· underline

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSoftmax · Attention Is All You Need · Diffusion