Latent Space Disentanglement in Diffusion Transformers Enables Zero-shot Fine-grained Semantic Editing
Zitao Shuai, Chenwei Wu, Zhengxu Tang, Bowen Song, Liyue Shen

TL;DR
This paper investigates the latent space of Diffusion Transformers, revealing its inherent disentanglement of text and image semantics, and introduces a zero-shot editing framework that leverages this property for precise image modifications.
Contribution
The paper uncovers the semantic disentanglement in DiT's latent space and proposes a novel EMS framework for zero-shot fine-grained image editing using this insight.
Findings
Latent spaces in DiTs are inherently decomposable into text and image components.
Disentangled latent spaces enable precise semantic control in image editing.
The proposed EMS framework achieves effective zero-shot fine-grained image editing.
Abstract
Diffusion Transformers (DiTs) have achieved remarkable success in diverse and high-quality text-to-image(T2I) generation. However, how text and image latents individually and jointly contribute to the semantics of generated images, remain largely unexplored. Through our investigation of DiT's latent space, we have uncovered key findings that unlock the potential for zero-shot fine-grained semantic editing: (1) Both the text and image spaces in DiTs are inherently decomposable. (2) These spaces collectively form a disentangled semantic representation space, enabling precise and fine-grained semantic control. (3) Effective image editing requires the combined use of both text and image latent spaces. Leveraging these insights, we propose a simple and effective Extract-Manipulate-Sample (EMS) framework for zero-shot fine-grained image editing. Our approach first utilizes a multi-modal Large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInnovative Microfluidic and Catalytic Techniques Innovation
MethodsDiffusion
