FasterDiT: Towards Faster Diffusion Transformers Training without   Architecture Modification

Jingfeng Yao; Wang Cheng; Wenyu Liu; Xinggang Wang

arXiv:2410.10356·cs.CV·November 1, 2024

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification

Jingfeng Yao, Wang Cheng, Wenyu Liu, Xinggang Wang

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

FasterDiT introduces a simple, practical training acceleration strategy for Diffusion Transformers that significantly reduces training time without changing the architecture, by leveraging a new perspective on training strategies and supervision methods.

Contribution

It proposes a novel interpretation of training failures using SNR PDF analysis, and develops a new supervision method to accelerate DiT training without architectural modifications.

Findings

01

Achieves 2.30 FID on ImageNet 256 at 1000k iterations

02

Training is 7 times faster than standard DiT

03

Provides over one hundred experimental results for strategy validation

Abstract

Diffusion Transformers (DiT) have attracted significant attention in research. However, they suffer from a slow convergence rate. In this paper, we aim to accelerate DiT training without any architectural modification. We identify the following issues in the training process: firstly, certain training strategies do not consistently perform well across different data. Secondly, the effectiveness of supervision at specific timesteps is limited. In response, we propose the following contributions: (1) We introduce a new perspective for interpreting the failure of the strategies. Specifically, we slightly extend the definition of Signal-to-Noise Ratio (SNR) and suggest observing the Probability Density Function (PDF) of SNR to understand the essence of the data robustness of the strategy. (2) We conduct numerous experiments and report over one hundred experimental results to empirically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hustvl/LightningDiT
pytorch

Models

🤗
hustvl/vavae-imagenet256-f16d32-dinov2
model· ♡ 6
♡ 6

Videos

FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification· slideslive

Taxonomy

TopicsElectronic and Structural Properties of Oxides · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing

MethodsSoftmax · Attention Is All You Need