Efficient Diffusion Transformer with Step-wise Dynamic Attention   Mediators

Yifan Pu; Zhuofan Xia; Jiayi Guo; Dongchen Han; Qixiu Li; Duo Li,; Yuhui Yuan; Ji Li; Yizeng Han; Shiji Song; Gao Huang; Xiu Li

arXiv:2408.05710·cs.CV·August 13, 2024

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li,, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a diffusion transformer with mediator tokens and dynamic attention to reduce computational costs and improve image quality during denoising, especially in early diffusion stages.

Contribution

We propose a novel diffusion transformer framework with mediator tokens and dynamic adjustment mechanisms, enhancing efficiency and image quality over prior models.

Findings

01

Reduces inference FLOPs significantly.

02

Achieves state-of-the-art FID score of 2.01.

03

Improves image quality with lower computational cost.

Abstract

This paper identifies significant redundancy in the query-key interactions within self-attention mechanisms of diffusion transformer models, particularly during the early stages of denoising diffusion steps. In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately. By modulating the number of mediator tokens during the denoising generation phases, our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail. Concurrently, integrating mediator tokens simplifies the attention module's complexity to a linear scale, enhancing the efficiency of global attention processes. Additionally, we propose a time-step dynamic mediator token adjustment mechanism that further decreases the required…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

leaplabthu/attention-mediators
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnalog and Mixed-Signal Circuit Design · Blind Source Separation Techniques · Neural Networks and Reservoir Computing

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Diffusion