FreeCus: Free Lunch Subject-driven Customization in Diffusion Transformers
Yanbing Zhang, Zhe Wang, Qin Zhou, Mengping Yang

TL;DR
FreeCus is a training-free framework that enhances diffusion transformers' zero-shot subject-driven image synthesis by leveraging attention sharing, improved feature extraction, and multimodal semantic integration, enabling high-fidelity, consistent customization.
Contribution
It introduces a novel training-free method for subject-driven image synthesis using diffusion transformers, combining attention sharing, enhanced feature analysis, and multimodal semantic models.
Findings
Achieves state-of-the-art zero-shot subject synthesis results.
Demonstrates seamless integration with existing inpainting and control modules.
Outperforms training-dependent methods in fidelity and consistency.
Abstract
In light of recent breakthroughs in text-to-image (T2I) generation, particularly with diffusion transformers (DiT), subject-driven technologies are increasingly being employed for high-fidelity customized production that preserves subject identity from reference inputs, enabling thrilling design workflows and engaging entertainment. Existing alternatives typically require either per-subject optimization via trainable text embeddings or training specialized encoders for subject feature extraction on large-scale datasets. Such dependencies on training procedures fundamentally constrain their practical applications. More importantly, current methodologies fail to fully leverage the inherent zero-shot potential of modern diffusion transformers (e.g., the Flux series) for authentic subject-driven synthesis. To bridge this gap, we propose FreeCus, a genuinely training-free framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
