TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

Felix Krause; Timy Phan; Ming Gui; Stefan Andreas Baumann; Vincent Tao Hu; Bj\"orn Ommer

arXiv:2501.04765·cs.CV·October 14, 2025

TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training

Felix Krause, Timy Phan, Ming Gui, Stefan Andreas Baumann, Vincent Tao Hu, Bj\"orn Ommer

PDF

Open Access 1 Repo 1 Models

TL;DR

TREAD introduces a token routing mechanism that enhances training efficiency and generative performance of diffusion models across architectures, reducing costs and boosting quality on ImageNet-256.

Contribution

This work proposes a novel token routing method that improves diffusion model training efficiency and performance without architectural modifications or extra parameters.

Findings

01

14x faster convergence at 400K iterations

02

37x training speedup over DiT at 7M iterations

03

Achieves state-of-the-art FID scores on ImageNet-256

Abstract

Diffusion models have emerged as the mainstream approach for visual generation. However, these models typically suffer from sample inefficiency and high training costs. Consequently, methods for efficient finetuning, inference and personalization were quickly adopted by the community. However, training these models in the first place remains very costly. While several recent approaches - including masking, distillation, and architectural modifications - have been proposed to improve training efficiency, each of these methods comes with a tradeoff: they achieve enhanced performance at the expense of increased computational cost or vice versa. In contrast, this work aims to improve training efficiency as well as generative performance at the same time through routes that act as a transport mechanism for randomly selected tokens from early layers to deeper layers of the model. Our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

compvis/tread
pytorchOfficial

Models

🤗
KBlueLeaf/HDM-xut-340M-anime
model· 783 dl· ♡ 136
783 dl♡ 136

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsDiffusion