Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion
Binglei Li, Mengping Yang, Zhiyu Tan, Junping Zhang, Hao Li

TL;DR
This paper systematically analyzes MMDiT-based diffusion models to understand their internal mechanisms and proposes training-free strategies to enhance text alignment, editing precision, and inference speed, leading to improved performance across multiple benchmarks.
Contribution
It introduces a comprehensive, training-free analysis pipeline for MMDiT models and proposes novel enhancement strategies based on these insights.
Findings
Semantic info appears in early blocks
Finer details are in later blocks
Enhancing text conditions improves semantic attributes
Abstract
Recent breakthroughs of transformer-based diffusion models, particularly with Multimodal Diffusion Transformers (MMDiT) driven models like FLUX and Qwen Image, have facilitated thrilling experiences in text-to-image generation and editing. To understand the internal mechanism of MMDiT-based models, existing methods tried to analyze the effect of specific components like positional encoding and attention layers. Yet, a comprehensive understanding of how different blocks and their interactions with textual conditions contribute to the synthesis process remains elusive. In this paper, we first develop a systematic pipeline to comprehensively investigate each block's functionality by removing, disabling and enhancing textual hidden-states at corresponding blocks. Our analysis reveals that 1) semantic information appears in earlier blocks and finer details are rendered in later blocks, 2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Humanities and Scholarship · Digital Media and Philosophy
