Training-free Stylized Text-to-Image Generation with Fast Inference
Xin Ma, Yaohui Wang, Xinyuan Chen, Tien-Tsin Wong, Cunjian Chen

TL;DR
This paper introduces OmniPainter, a training-free, fast inference method for stylized text-to-image generation using pre-trained diffusion models, which extracts style features without fine-tuning and outperforms existing methods.
Contribution
It proposes a novel approach leveraging latent consistency and norm mixture of self-attention to enable stylization without additional training or optimization.
Findings
Outperforms state-of-the-art stylized image generation methods.
Enables style transfer without fine-tuning or extra optimization.
Achieves high-quality stylized images with fast inference.
Abstract
Although diffusion models exhibit impressive generative capabilities, existing methods for stylized image generation based on these models often require textual inversion or fine-tuning with style images, which is time-consuming and limits the practical applicability of large-scale diffusion models. To address these challenges, we propose a novel stylized image generation method leveraging a pre-trained large-scale diffusion model without requiring fine-tuning or any additional optimization, termed as OmniPainter. Specifically, we exploit the self-consistency property of latent consistency models to extract the representative style statistics from reference style images to guide the stylization process. Additionally, we then introduce the norm mixture of self-attention, which enables the model to query the most relevant style patterns from these statistics for the intermediate output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Consistency Models · ALIGN
