EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching
Xinwang Chen, Ning Liu, Yichen Zhu, Feifei Feng, Jian Tang

TL;DR
This paper introduces the Efficient Diffusion Transformer (EDT), a lightweight and computationally efficient framework for diffusion probabilistic models that improves image synthesis performance while significantly reducing training and inference costs.
Contribution
The paper proposes a novel lightweight diffusion transformer architecture with a training-free attention modulation and a token relation-enhanced masking strategy, inspired by human sketching.
Findings
EDT achieves faster training and inference speeds compared to MDTv2.
EDT surpasses existing transformer-based diffusion models in image synthesis quality.
The framework reduces computational costs while maintaining or improving performance.
Abstract
Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. The framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDigital Filter Design and Implementation
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Softmax
