Discrete Diffusion in Large Language and Multimodal Models: A Survey
Runpeng Yu, Qi Li, Xinchao Wang

TL;DR
This survey reviews discrete diffusion models for language and multimodal tasks, highlighting their parallel decoding, performance, and potential as alternatives to autoregressive models.
Contribution
It provides a comprehensive overview of the development, techniques, and applications of discrete diffusion language models and multimodal models, emphasizing their advantages and future directions.
Findings
Discrete diffusion models enable parallel decoding and fine-grained control.
Performance of d(M)LLMs is comparable to autoregressive models with faster inference.
Emerging applications span language, vision-language, and biological domains.
Abstract
In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic perception. These capabilities are previously difficult to achieve with AR models. A growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10 acceleration in inference speed. These developments position discrete diffusion models as a promising alternative to intelligence based on the traditional autoregressive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsDiffusion · ADaptive gradient method with the OPTimal convergence rate
