Parallel Sampling from Masked Diffusion Models via Conditional Independence Testing
Iskander Azangulov, Teodora Pandeva, Niranjani Prasad, Javier Zazo, Sushrut Karmalkar

TL;DR
PUNT is a novel parallel sampling method for masked diffusion models that balances independence and confidence, enabling faster text generation with improved accuracy and hierarchical structure emergence.
Contribution
It introduces a model-agnostic approach that identifies token dependencies and removes conflicting tokens to enable effective parallel sampling in masked diffusion models.
Findings
Up to 16% higher accuracy on IFEval benchmark.
Better trade-off between accuracy and compute compared to baselines.
Induces hierarchical, planning-like generation strategy.
Abstract
Masked diffusion models (MDMs) offer a compelling alternative to autoregressive models (ARMs) for discrete text generation because they enable parallel token sampling, rather than sequential, left-to-right generation. This means potentially much faster inference. However, effective parallel sampling faces two competing requirements: (i) simultaneously updated tokens must be conditionally independent, and (ii) updates should prioritise high-confidence predictions. These goals conflict because high-confidence predictions often cluster and depend on each other, opportunities for parallel updates. We present PUNT, a model-agnostic sampler that reconciles this trade-off. Our method identifies token dependencies and removes lower-confidence tokens from conflicting groups. This produces sets of indices for unmasking that satisfy both independence and confidence criteria. Our approach ensures…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
