Differences in Text Generated by Diffusion and Autoregressive Language Models

Zeyang Zhang; Chengwei Liang; Xingyan Chen; Meiqi Gu; Minrui Luo; Jingzhao Zhang; Tianxing He

arXiv:2605.12522·cs.CL·May 14, 2026

Differences in Text Generated by Diffusion and Autoregressive Language Models

Zeyang Zhang, Chengwei Liang, Xingyan Chen, Meiqi Gu, Minrui Luo, Jingzhao Zhang, Tianxing He

PDF

TL;DR

This paper compares diffusion and autoregressive language models, revealing that diffusion models produce more coherent and diverse text due to their training objectives and decoding strategies, with implications for future model design.

Contribution

It provides an empirical and theoretical analysis of the intrinsic differences between diffusion and autoregressive language models, highlighting the roles of training objectives and decoding algorithms.

Findings

01

Diffusion models have lower n-gram entropy than autoregressive models.

02

Diffusion models exhibit higher semantic coherence and diversity.

03

Decoding algorithms mainly cause the entropy reduction in diffusion models.

Abstract

Diffusion language models (DLMs) are promising alternatives to autoregressive language models (ARMs), yet the intrinsic differences in their generated text remain underexplored. We first find empirically that off-the-shelf DLMs exhibit lower $n$ -gram entropy, higher semantic coherence, and higher semantic diversity. To understand the cause, we conduct controlled experiments that decouple the effects of training objectives and decoding algorithms. Results suggest that the DLM training objective contributes to the increases in semantic coherence and semantic diversity, but has a minor influence on entropy. These differences are primarily driven by the bidirectional context; other components in the training objective, such as input masking, label masking, and the weighting function, have a much weaker influence. Further, our experiments demonstrate that the reduction in entropy stems from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.