Unifying Masked Diffusion Models with Various Generation Orders and Beyond
Chunsan Hong, Sanghyun Lee, Jong Chul Ye

TL;DR
This paper introduces a unified framework for masked diffusion models that can generate text in various orders, and proposes a learnable ordering method that improves language modeling performance.
Contribution
The paper presents OeMDM for flexible diffusion processes and LoMDM for jointly learning generation order and diffusion backbone from scratch.
Findings
LoMDM outperforms existing discrete diffusion models on language benchmarks.
OeMDM unifies MDM, ARM, and block diffusion in a single interpretative framework.
Learnable ordering enables context-dependent text generation.
Abstract
Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior work either hard-codes an ordering (e.g., blockwise left-to-right) or learns an ordering policy for a pretrained MDM, which incurs extra cost and can yield suboptimal solutions due to the two-stage optimization. Motivated by this, we propose order-expressive masked diffusion model (OeMDM) for a broad class of diffusion generative processes with various generation orders, enabling the interpretation of MDM, ARM, and block diffusion in a single framework. Furthermore, building on OeMDM, we introduce learnable-order masked diffusion model (LoMDM), which jointly learns the generation ordering and diffusion backbone through a single objective from scratch, enabling the diffusion model to generate text in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Language Development and Disorders · Language and cultural evolution
