FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
Jingfeng Yao, Wang Cheng, Wenyu Liu, Xinggang Wang

TL;DR
FasterDiT introduces a simple, practical training acceleration strategy for Diffusion Transformers that significantly reduces training time without changing the architecture, by leveraging a new perspective on training strategies and supervision methods.
Contribution
It proposes a novel interpretation of training failures using SNR PDF analysis, and develops a new supervision method to accelerate DiT training without architectural modifications.
Findings
Achieves 2.30 FID on ImageNet 256 at 1000k iterations
Training is 7 times faster than standard DiT
Provides over one hundred experimental results for strategy validation
Abstract
Diffusion Transformers (DiT) have attracted significant attention in research. However, they suffer from a slow convergence rate. In this paper, we aim to accelerate DiT training without any architectural modification. We identify the following issues in the training process: firstly, certain training strategies do not consistently perform well across different data. Secondly, the effectiveness of supervision at specific timesteps is limited. In response, we propose the following contributions: (1) We introduce a new perspective for interpreting the failure of the strategies. Specifically, we slightly extend the definition of Signal-to-Noise Ratio (SNR) and suggest observing the Probability Density Function (PDF) of SNR to understand the essence of the data robustness of the strategy. (2) We conduct numerous experiments and report over one hundred experimental results to empirically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsElectronic and Structural Properties of Oxides · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing
MethodsSoftmax · Attention Is All You Need
