Textualize Visual Prompt for Image Editing via Diffusion Bridge
Pengcheng Xu, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Ruoyu Zhao,, Charles Ling, Boyu Wang

TL;DR
This paper introduces a novel diffusion bridge framework that converts visual prompts into text embeddings for image editing, eliminating the need for retraining and enhancing scalability and generalization.
Contribution
It proposes a diffusion-based method that textualizes visual prompts using a single text-to-image model, avoiding explicit image-to-image models and retraining.
Findings
Achieves high fidelity and contextual coherence in image editing.
Demonstrates strong generalization with just one image pair as prompt.
Outperforms existing methods in delicate editing tasks.
Abstract
Visual prompt, a pair of before-and-after edited images, can convey indescribable imagery transformations and prosper in image editing. However, current visual prompt methods rely on a pretrained text-guided image-to-image generative model that requires a triplet of text, before, and after images for retraining over a text-to-image model. Such crafting triplets and retraining processes limit the scalability and generalization of editing. In this paper, we present a framework based on any single text-to-image model without reliance on the explicit image-to-image model thus enhancing the generalizability and scalability. Specifically, by leveraging the probability-flow ordinary equation, we construct a diffusion bridge to transfer the distribution between before-and-after images under the text guidance. By optimizing the text via the bridge, the framework adaptively textualizes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
