DiverseDiT: Towards Diverse Representation Learning in Diffusion Transformers
Mengping Yang, Zhiyu Tan, Binglei Li, Xiaomeng Yang, Hesen Chen, Hao Li

TL;DR
This paper investigates the internal representation dynamics of Diffusion Transformers and introduces DiverseDiT, a framework that promotes diverse internal representations to improve performance and convergence in visual synthesis tasks.
Contribution
The paper reveals the importance of representation diversity in DiTs and proposes DiverseDiT, which explicitly encourages diverse features through residual connections and a diversity loss.
Findings
DiverseDiT improves performance across different backbones and sizes.
DiverseDiT accelerates convergence in diffusion models.
DiverseDiT complements existing representation learning methods.
Abstract
Recent breakthroughs in Diffusion Transformers (DiTs) have revolutionized the field of visual synthesis due to their superior scalability. To facilitate DiTs' capability of capturing meaningful internal representations, recent works such as REPA incorporate external pretrained encoders for representation alignment. However, the underlying mechanisms governing representation learning within DiTs are not well understood. To this end, we first systematically investigate the representation dynamics of DiTs. Through analyzing the evolution and influence of internal representations under various settings, we reveal that representation diversity across blocks is a crucial factor for effective learning. Based on this key insight, we propose DiverseDiT, a novel framework that explicitly promotes representation diversity. DiverseDiT incorporates long residual connections to diversify input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
