Training-free Stylized Text-to-Image Generation with Fast Inference

Xin Ma; Yaohui Wang; Xinyuan Chen; Tien-Tsin Wong; Cunjian Chen

arXiv:2505.19063·cs.CV·May 28, 2025

Training-free Stylized Text-to-Image Generation with Fast Inference

Xin Ma, Yaohui Wang, Xinyuan Chen, Tien-Tsin Wong, Cunjian Chen

PDF

Open Access

TL;DR

This paper introduces OmniPainter, a training-free, fast inference method for stylized text-to-image generation using pre-trained diffusion models, which extracts style features without fine-tuning and outperforms existing methods.

Contribution

It proposes a novel approach leveraging latent consistency and norm mixture of self-attention to enable stylization without additional training or optimization.

Findings

01

Outperforms state-of-the-art stylized image generation methods.

02

Enables style transfer without fine-tuning or extra optimization.

03

Achieves high-quality stylized images with fast inference.

Abstract

Although diffusion models exhibit impressive generative capabilities, existing methods for stylized image generation based on these models often require textual inversion or fine-tuning with style images, which is time-consuming and limits the practical applicability of large-scale diffusion models. To address these challenges, we propose a novel stylized image generation method leveraging a pre-trained large-scale diffusion model without requiring fine-tuning or any additional optimization, termed as OmniPainter. Specifically, we exploit the self-consistency property of latent consistency models to extract the representative style statistics from reference style images to guide the stylization process. Additionally, we then introduce the norm mixture of self-attention, which enables the model to query the most relevant style patterns from these statistics for the intermediate output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Consistency Models · ALIGN