DiffArtist: Towards Structure and Appearance Controllable Image Stylization
Ruixiang Jiang, Changwen Chen

TL;DR
DiffArtist introduces a novel 2D image stylization method that independently controls both structural and appearance styles, enabling more precise and flexible artistic transformations without additional tuning.
Contribution
It is the first to provide simultaneous, fine-grained control over structure and appearance in neural stylization using separate diffusion processes.
Findings
Achieves superior style fidelity and dual-controllability.
Outperforms existing methods in style transfer quality.
Uses a training-free, text-driven approach.
Abstract
Artistic styles are defined by both their structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance-level features such as color and texture, often neglecting the equally crucial aspect of structural stylization. To address this gap, we introduce \textbf{DiffArtist}, the first 2D stylization method to offer fine-grained, simultaneous control over both structure and appearance style strength. This dual controllability is achieved by representing structure and appearance generation as separate diffusion processes, necessitating no further tuning or additional adapters. To properly evaluate this new capability of dual stylization, we further propose a Multimodal LLM-based stylization evaluator that aligns significantly better with human preferences than existing metrics. Extensive analysis shows that DiffArtist achieves superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Handwritten Text Recognition Techniques · Natural Language Processing Techniques
MethodsFocus · ALIGN · Diffusion
