Causal Diffusion Transformers for Generative Modeling
Chaorui Deng, Deyao Zhu, Kunchang Li, Shi Guang, Haoqi Fan

TL;DR
This paper introduces Causal Diffusion, a novel autoregressive framework compatible with existing models, that improves generative performance and enables flexible multimodal, in-context reasoning and manipulation.
Contribution
It proposes CausalFusion, a dual-factorized transformer model that combines diffusion and autoregressive methods, achieving state-of-the-art results and multimodal capabilities.
Findings
State-of-the-art ImageNet generation results
Effective multimodal image generation and captioning
Zero-shot in-context image manipulation capabilities
Abstract
We introduce Causal Diffusion as the autoregressive (AR) counterpart of Diffusion models. It is a next-token(s) forecasting framework that is friendly to both discrete and continuous modalities and compatible with existing next-token prediction models like LLaMA and GPT. While recent works attempt to combine diffusion with AR models, we show that introducing sequential factorization to a diffusion model can substantially improve its performance and enables a smooth transition between AR and diffusion generation modes. Hence, we propose CausalFusion - a decoder-only transformer that dual-factorizes data across sequential tokens and diffusion noise levels, leading to state-of-the-art results on the ImageNet generation benchmark while also enjoying the AR advantage of generating an arbitrary number of tokens for in-context reasoning. We further demonstrate CausalFusion's multimodal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Gene Regulatory Network Analysis · Cellular Automata and Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Multi-Head Attention · Cosine Annealing · Residual Connection · Attention Dropout · Linear Warmup With Cosine Annealing · Discriminative Fine-Tuning · Weight Decay · Softmax
