A training-free framework for high-fidelity appearance transfer via diffusion transformers

Shengrong Gu; Ye Wang; Song Wu; Rui Ma; Qian Wang; Lanjun Wang; Zili Yi

arXiv:2603.26767·cs.CV·March 31, 2026

A training-free framework for high-fidelity appearance transfer via diffusion transformers

Shengrong Gu, Ye Wang, Song Wu, Rui Ma, Qian Wang, Lanjun Wang, Zili Yi

PDF

TL;DR

This paper introduces a training-free framework that uses diffusion transformers and a novel attention-sharing mechanism to achieve high-fidelity appearance transfer while preserving scene structure.

Contribution

It presents the first training-free method for controlling diffusion transformers for appearance transfer, disentangling structure and appearance without additional training.

Findings

01

Outperforms specialized methods in appearance transfer tasks.

02

Operates effectively at 1024px resolution.

03

Achieves state-of-the-art results in structural preservation and appearance fidelity.

Abstract

Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first training-free framework specifically designed to tame DiTs for high-fidelity appearance transfer. Our core is a synergistic system that disentangles structure and appearance. We leverage high-fidelity inversion to establish a rich content prior for the source image, capturing its lighting and micro-textures. A novel attention-sharing mechanism then dynamically fuses purified appearance features from a reference, guided by geometric priors. Our unified approach operates at 1024px and outperforms specialized methods on tasks ranging from semantic attribute transfer to fine-grained material…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.