Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference
David Fox, Sam Bowyer, Song Liu, Laurence Aitchison, Raul Santos-Rodriguez, Mengyue Yang

TL;DR
This paper introduces a variational inference framework to learn optimal parallel generation orders in masked discrete diffusion models, improving efficiency and sample quality in generative tasks.
Contribution
It proposes a novel variational inference approach for learning generation orders in MDMs, enabling more efficient parallel sampling during training.
Findings
Achieves 33.1% accuracy with 4 steps on GSM8K, outperforming standard methods.
Demonstrates competitive performance against heuristic strategies in highly parallel regimes.
Provides a new parameterization for the approximate posterior of generation orders.
Abstract
Masked discrete diffusion models (MDMs) are a promising new approach to generative modelling, offering the ability for parallel token generation and therefore greater efficiency than autoregressive counterparts. However, achieving an optimal balance between parallel generation and sample quality remains an open problem. Current approaches primarily address this issue through fixed, heuristic parallel sampling methods. There exist some recent learning based approaches to this problem, but its formulation from the perspective of variational inference remains underexplored. In this work, we propose a variational inference framework for learning parallel generation orders for MDMs. As part of our method, we propose a parameterisation for the approximate posterior of generation orders which facilitates parallelism and efficient sampling during training. Using this method, we conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Markov Chains and Monte Carlo Methods · Model Reduction and Neural Networks
