TL;DR
The paper introduces Infinite Mask Diffusion Model (IMDM), a stochastic extension of Masked Diffusion Models, enabling efficient few-step language generation and surpassing existing methods in small-step distillation tasks.
Contribution
IMDM mitigates the theoretical factorization error bound of MDMs by using a stochastic infinite-state mask, improving few-step generation and distillation performance.
Findings
IMDM outperforms MDM in few-step synthetic tasks.
IMDM surpasses existing few-step distillation methods on LM1B and OpenWebText.
Standard MDMs fail in few-step generation due to factorization error.
Abstract
Masked Diffusion Models (MDMs) have emerged as a promising alternative to autoregressive models in language modeling, offering the advantages of parallel decoding and bidirectional context processing within a simple yet effective framework. Specifically, their explicit distinction between masked tokens and data underlies their simple framework and effective conditional generation. However, MDMs typically require many sampling iterations due to factorization errors stemming from simultaneous token updates. We observe that a theoretical lower bound of the factorization error exists, which standard MDMs cannot reduce due to their use of a deterministic single-state mask. In this paper, we propose the Infinite Mask Diffusion Model (IMDM), which introduces a stochastic infinite-state mask to mitigate the theoretical bound while directly inheriting the benefits of MDMs, including the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
