Simple Self-Conditioning Adaptation for Masked Diffusion Models
Michael Cardei, Huu Binh Ta, Ferdinando Fioretto

TL;DR
This paper introduces Self-Conditioned Masked Diffusion Models (SCMDM), a simple post-training adaptation that improves discrete sequence generation by conditioning each denoising step on previous predictions, enhancing performance across multiple domains.
Contribution
The paper proposes a minimal-change, effective self-conditioning method for masked diffusion models that avoids complex training or additional evaluations, outperforming existing approaches.
Findings
Nearly 50% reduction in generative perplexity on OWT models (42.89 to 23.72)
Improved quality in image synthesis, molecular generation, and genomic modeling
Self-conditioning outperforms partial self-conditioning strategies like 50% dropout.
Abstract
Masked diffusion models (MDMs) generate discrete sequences by iterative denoising under an absorbing masking process. In standard masked diffusion, if a token remains masked after a reverse update, the model discards its clean-state prediction for that position. Thus, still-masked positions must be repeatedly inferred from the mask token alone. This design choice limits cross-step refinement. To address this limitation, this paper proposes a simple, yet effective, post-training adaptation for MDMs that conditions each denoising step on the model's own previous clean-state predictions. The resulting method, called Self-Conditioned Masked Diffusion Models (SCMDM), requires minimal architectural change, does not introduce a recurrent latent-state pathway, does not rely on an auxiliary reference model, and adds no extra denoiser evaluations during sampling. This is an important departure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
