Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs

Hongkang Li; Hancheng Min; Rene Vidal

arXiv:2604.10074·cs.LG·April 14, 2026

Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs

Hongkang Li, Hancheng Min, Rene Vidal

PDF

TL;DR

This paper provides the first convergence analysis of transformer-based diffusion models, showing how they approximate optimal denoising and the conditions needed for convergence in multi-token Gaussian mixtures.

Contribution

It offers a theoretical understanding of why transformers excel in diffusion models, including convergence conditions and the role of self-attention in denoising.

Findings

01

Transformer models can converge to the Bayes optimal denoising risk.

02

Self-attention modules implement a mean denoising mechanism.

03

Numerical experiments validate the theoretical analysis.

Abstract

Transformer-based diffusion models have demonstrated remarkable performance at generating high-quality samples. However, our theoretical understanding of the reasons for this success remains limited. For instance, existing models are typically trained by minimizing a denoising objective, which is equivalent to fitting the score function of the training data. However, we do not know why transformer-based models can match the score function for denoising, or why gradient-based methods converge to the optimal denoising model despite the non-convex loss landscape. To the best of our knowledge, this paper provides the first convergence analysis for training transformer-based diffusion models. More specifically, we consider the population Denoising Diffusion Probabilistic Model (DDPM) objective for denoising data that follow a multi-token Gaussian mixture distribution. We theoretically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.