Masked Diffusion Models are Secretly Learned-Order Autoregressive Models
Prateek Garg, Bhavya Kohli, Sunita Sarawagi

TL;DR
This paper demonstrates that Masked Diffusion Models can be trained to learn optimal decoding orders by optimizing their noise schedules, effectively transforming them into auto-regressive models with learnable sequences.
Contribution
It introduces a training framework that optimizes decoding order in MDMs using multivariate noise schedules, revealing their auto-regressive nature with learnable orders.
Findings
MDMs can identify and optimize decoding order during training.
The MDM objective decomposes into weighted auto-regressive losses.
This approach breaks the invariance to noise schedule in MDMs.
Abstract
Masked Diffusion Models (MDMs) have emerged as one of the most promising paradigms for generative modeling over discrete domains. It is known that MDMs effectively train to decode tokens in a random order, and that this ordering has significant performance implications in practice. This observation raises a fundamental question: can we design a training framework that optimizes for a favorable decoding order? We answer this in the affirmative, showing that the continuous-time variational objective of MDMs, when equipped with multivariate noise schedules, can identify and optimize for a decoding order during training. We establish a direct correspondence between decoding order and the multivariate noise schedule and show that this setting breaks invariance of the MDM objective to the noise schedule. Furthermore, we prove that the MDM objective decomposes precisely into a weighted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Speech Recognition and Synthesis
