EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like   Sketching

Xinwang Chen; Ning Liu; Yichen Zhu; Feifei Feng; Jian Tang

arXiv:2410.23788·cs.CV·November 1, 2024

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching

Xinwang Chen, Ning Liu, Yichen Zhu, Feifei Feng, Jian Tang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces the Efficient Diffusion Transformer (EDT), a lightweight and computationally efficient framework for diffusion probabilistic models that improves image synthesis performance while significantly reducing training and inference costs.

Contribution

The paper proposes a novel lightweight diffusion transformer architecture with a training-free attention modulation and a token relation-enhanced masking strategy, inspired by human sketching.

Findings

01

EDT achieves faster training and inference speeds compared to MDTv2.

02

EDT surpasses existing transformer-based diffusion models in image synthesis quality.

03

The framework reduces computational costs while maintaining or improving performance.

Abstract

Transformer-based Diffusion Probabilistic Models (DPMs) have shown more potential than CNN-based DPMs, yet their extensive computational requirements hinder widespread practical applications. To reduce the computation budget of transformer-based DPMs, this work proposes the Efficient Diffusion Transformer (EDT) framework. The framework includes a lightweight-design diffusion model architecture, and a training-free Attention Modulation Matrix and its alternation arrangement in EDT inspired by human-like sketching. Additionally, we propose a token relation-enhanced masking training strategy tailored explicitly for EDT to augment its token relation learning capability. Our extensive experiments demonstrate the efficacy of EDT. The EDT framework reduces training and inference costs and surpasses existing transformer-based diffusion models in image synthesis performance, thereby achieving a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xinwangchen/edt
pytorchOfficial

Videos

EDT: An Efficient Diffusion Transformer Framework Inspired by Human-like Sketching· slideslive

Taxonomy

TopicsDigital Filter Design and Implementation

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Layer Normalization · Residual Connection · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Softmax