FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers

Yanbing Zhang; Zhe Wang; Qin Zhou; Mengping Yang

arXiv:2507.15249·cs.CV·July 22, 2025

FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers

Yanbing Zhang, Zhe Wang, Qin Zhou, Mengping Yang

PDF

TL;DR

FreeCus is a training-free framework that enhances diffusion transformers' zero-shot subject-driven image synthesis by leveraging attention sharing, improved feature extraction, and multimodal semantic integration, enabling high-fidelity, consistent customization.

Contribution

It introduces a novel training-free method for subject-driven image synthesis using diffusion transformers, combining attention sharing, enhanced feature analysis, and multimodal semantic models.

Findings

01

Achieves state-of-the-art zero-shot subject synthesis results.

02

Demonstrates seamless integration with existing inpainting and control modules.

03

Outperforms training-dependent methods in fidelity and consistency.

Abstract

In light of recent breakthroughs in text-to-image (T2I) generation, particularly with diffusion transformers (DiT), subject-driven technologies are increasingly being employed for high-fidelity customized production that preserves subject identity from reference inputs, enabling thrilling design workflows and engaging entertainment. Existing alternatives typically require either per-subject optimization via trainable text embeddings or training specialized encoders for subject feature extraction on large-scale datasets. Such dependencies on training procedures fundamentally constrain their practical applications. More importantly, current methodologies fail to fully leverage the inherent zero-shot potential of modern diffusion transformers (e.g., the Flux series) for authentic subject-driven synthesis. To bridge this gap, we propose FreeCus, a genuinely training-free framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.