DiffArtist: Towards Structure and Appearance Controllable Image Stylization

Ruixiang Jiang; Changwen Chen

arXiv:2407.15842·cs.CV·August 28, 2025

DiffArtist: Towards Structure and Appearance Controllable Image Stylization

Ruixiang Jiang, Changwen Chen

PDF

Open Access 1 Repo

TL;DR

DiffArtist introduces a novel 2D image stylization method that independently controls both structural and appearance styles, enabling more precise and flexible artistic transformations without additional tuning.

Contribution

It is the first to provide simultaneous, fine-grained control over structure and appearance in neural stylization using separate diffusion processes.

Findings

01

Achieves superior style fidelity and dual-controllability.

02

Outperforms existing methods in style transfer quality.

03

Uses a training-free, text-driven approach.

Abstract

Artistic styles are defined by both their structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance-level features such as color and texture, often neglecting the equally crucial aspect of structural stylization. To address this gap, we introduce \textbf{DiffArtist}, the first 2D stylization method to offer fine-grained, simultaneous control over both structure and appearance style strength. This dual controllability is achieved by representing structure and appearance generation as separate diffusion processes, necessitating no further tuning or additional adapters. To properly evaluate this new capability of dual stylization, we further propose a Multimodal LLM-based stylization evaluator that aligns significantly better with human preferences than existing metrics. Extensive analysis shows that DiffArtist achieves superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

songrise/Artist
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Handwritten Text Recognition Techniques · Natural Language Processing Techniques

MethodsFocus · ALIGN · Diffusion