Discrete Diffusion in Large Language and Multimodal Models: A Survey

Runpeng Yu; Qi Li; Xinchao Wang

arXiv:2506.13759·cs.LG·September 22, 2025

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Runpeng Yu, Qi Li, Xinchao Wang

PDF

Open Access 1 Repo

TL;DR

This survey reviews discrete diffusion models for language and multimodal tasks, highlighting their parallel decoding, performance, and potential as alternatives to autoregressive models.

Contribution

It provides a comprehensive overview of the development, techniques, and applications of discrete diffusion language models and multimodal models, emphasizing their advantages and future directions.

Findings

01

Discrete diffusion models enable parallel decoding and fine-grained control.

02

Performance of d(M)LLMs is comparable to autoregressive models with faster inference.

03

Emerging applications span language, vision-language, and biological domains.

Abstract

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output control, and dynamic perception. These capabilities are previously difficult to achieve with AR models. A growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10 $\times$ acceleration in inference speed. These developments position discrete diffusion models as a promising alternative to intelligence based on the traditional autoregressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liqiiiii/dllm-survey
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems

MethodsDiffusion · ADaptive gradient method with the OPTimal convergence rate