Discrete Diffusion Models for Language Generation
Ashen Weligalle

TL;DR
This paper explores the application of discrete diffusion models to natural language generation, comparing their performance and efficiency with traditional autoregressive models, and highlighting their potential for parallel processing.
Contribution
It provides the first comprehensive evaluation of discrete diffusion models for language, demonstrating their strengths and limitations relative to autoregressive approaches.
Findings
D3PM achieves a BPT of 5.72, indicating competitive generative quality.
Autoregressive models outperform in compression with lower BPT.
D3PM offers higher processing speed, up to 3.97 batches/sec.
Abstract
Diffusion models have emerged as a powerful class of generative models, achieving state-of-the-art results in continuous data domains such as image and video generation. Their core mechanism involves a forward diffusion process that gradually transforms structured data into a Gaussian-like distribution, followed by a learned reverse process to reconstruct the data. While successful in continuous modalities, applying this framework to discrete data-particularly natural language-remains challenging due to token dependency complexities and the lack of a defined generation order.This thesis investigates the feasibility and performance of discrete diffusion models for natural language generation. Specifically, we evaluate the Discrete Denoising Diffusion Probabilistic Model (D3PM) and compare it with traditional autoregressive (AR) language models. To assess generative performance, we use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Speech Recognition and Synthesis · Language and cultural evolution
