Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer

Yanqi Ge; Jiaqi Liu; Qingnan Fan; Xi Jiang; Ye Huang; Shuai Qin; Hong Gu; Wen Li; Lixin Duan

arXiv:2404.06835·cs.CV·January 16, 2026·2 cites

Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer

Yanqi Ge, Jiaqi Liu, Qingnan Fan, Xi Jiang, Ye Huang, Shuai Qin, Hong Gu, Wen Li, Lixin Duan

PDF

Open Access

TL;DR

This paper introduces a novel adaptive style transfer method for text-to-image diffusion models that preserves image structure while applying style effects, overcoming limitations of previous prompt-level approaches.

Contribution

The paper proposes Adaptive Style Incorporation (ASI), combining Siamese Cross-Attention and Adaptive Content-Style Blending for fine-grained, structure-preserving style transfer.

Findings

01

Superior structure preservation demonstrated

02

Enhanced stylized effects achieved

03

Outperforms previous prompt-level methods

Abstract

In this work, we target the task of text-driven style transfer in the context of text-to-image (T2I) diffusion models. The main challenge is consistent structure preservation while enabling effective style transfer effects. The past approaches in this field directly concatenate the content and style prompts for a prompt-level style injection, leading to unavoidable structure distortions. In this work, we propose a novel solution to the text-driven style transfer task, namely, Adaptive Style Incorporation~(ASI), to achieve fine-grained feature-level style incorporation. It consists of the Siamese Cross-Attention~(SiCA) to decouple the single-track cross-attention to a dual-track structure to obtain separate content and style features, and the Adaptive Content-Style Blending (AdaBlending) module to couple the content and style information from a structure-consistent manner.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Music and Audio Processing

MethodsDiffusion