FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation
Xiang Gao, Jiaying Liu

TL;DR
FBSDiff introduces a plug-and-play frequency band substitution technique for diffusion features, enabling highly controllable, high-quality text-driven image translation without additional training or fine-tuning.
Contribution
The paper presents a novel frequency band substitution layer that allows dynamic, controllable image-to-image translation using pre-trained diffusion models without retraining.
Findings
Outperforms related methods in visual quality and controllability
Enables flexible control over guiding factors and intensity
Operates without model training or fine-tuning
Abstract
Large-scale text-to-image diffusion models have been a revolutionary milestone in the evolution of generative AI and multimodal technology, allowing wonderful image generation with natural-language text prompt. However, the issue of lacking controllability of such models restricts their practical applicability for real-life content creation. Thus, attention has been focused on leveraging a reference image to control text-to-image synthesis, which is also regarded as manipulating (or editing) a reference image as per a text prompt, namely, text-driven image-to-image translation. This paper contributes a novel, concise, and efficient approach that adapts pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner, realizing high-quality and versatile text-driven I2I translation without any model training, model fine-tuning, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsSoftmax · Attention Is All You Need · Diffusion
