TL;DR
This paper revisits uniform diffusion models, introducing a leave-one-out denoiser and an absorbing state reformulation, leading to improved language modeling performance and insights into diffusion model design.
Contribution
It introduces a leave-one-out posterior for uniform diffusion models and an absorbing-state reformulation, enhancing inference and unifying different diffusion approaches.
Findings
Leave-one-out denoiser improves UDM generation quality.
Absorbing-state reformulation simplifies sampling and denoising.
Empirical results show improved language modeling performance.
Abstract
Discrete diffusion models are often trained through clean-data prediction, but the prediction can be used in different ways to define the reverse dynamics. In Masked Diffusion Models (MDM) these choices largely coincide, whereas in Uniform Diffusion Models (UDM) they do not. We show that the standard plug-in bridge parameterization for UDM is not optimized by the denoising posterior, but by a leave-one-out posterior that predicts each clean token without using its own noisy observation. This identifies a mismatch between the plug-in ELBO and the usual cross-entropy denoising objective. We characterize the leave-one-out target and derive exact conversions between the denoiser, the leave-one-out posterior, and the score. These conversions allow us to disentangle parameterization and training objective. Our results also lead to inference improvements without any additional training through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
