DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer
Ying Hu, Chenyi Zhuang, Pan Gao

TL;DR
DiffuseST introduces a training-free style transfer method that combines textual and spatial features using diffusion models, enabling balanced and controllable artistic style transfer without retraining.
Contribution
It proposes a novel approach that leverages textual embeddings and diffusion model properties to improve style transfer, avoiding the need for training or fine-tuning.
Findings
Effective and robust style transfer results.
Enhanced control over content and style balance.
Potential applicability to other tasks.
Abstract
Style transfer aims to fuse the artistic representation of a style image with the structural information of a content image. Existing methods train specific networks or utilize pre-trained models to learn content and style features. However, they rely solely on textual or spatial representations that are inadequate to achieve the balance between content and style. In this work, we propose a novel and training-free approach for style transfer, combining textual embedding with spatial features and separating the injection of content or style. Specifically, we adopt the BLIP-2 encoder to extract the textual representation of the style image. We utilize the DDIM inversion technique to extract intermediate embeddings in content and style branches as spatial features. Finally, we harness the step-by-step property of diffusion models by separating the injection of content and style in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis
MethodsDiffusion
