Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Emiel Hoogeboom; David Ruhe; Jonathan Heek; Thomas Mensink; Tim Salimans

arXiv:2603.20155·cs.LG·March 23, 2026

Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

PDF

Open Access

TL;DR

This paper introduces Discrete Moment Matching Distillation (D-MMD), a novel method for effectively distilling discrete diffusion models by adapting continuous domain techniques, resulting in high-quality, diverse, and outperforming generators for text and image data.

Contribution

The paper proposes D-MMD, a new distillation approach for discrete diffusion models that preserves quality and diversity, outperforming previous methods and enabling efficient sampling.

Findings

01

D-MMD maintains high quality and diversity in distilled models.

02

Distilled models can outperform their original teachers.

03

Effective on both text and image datasets.

Abstract

It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis