Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans

TL;DR
This paper introduces Discrete Moment Matching Distillation (D-MMD), a novel method for effectively distilling discrete diffusion models by adapting continuous domain techniques, resulting in high-quality, diverse, and outperforming generators for text and image data.
Contribution
The paper proposes D-MMD, a new distillation approach for discrete diffusion models that preserves quality and diversity, outperforming previous methods and enabling efficient sampling.
Findings
D-MMD maintains high quality and diversity in distilled models.
Distilled models can outperform their original teachers.
Effective on both text and image datasets.
Abstract
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
