The Diffusion Duality, Chapter II: $\Psi$-Samplers

Justin Deschenaux; Caglar Gulcehre; Subham Sekhar Sahoo

arXiv:2602.21185·cs.LG·May 19, 2026

The Diffusion Duality, Chapter II: $\Psi$-Samplers

Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

PDF

1 Repo 1 Models 3 Reviews

TL;DR

The paper introduces Predictor-Corrector samplers for discrete diffusion models that outperform traditional methods in language and image tasks, with improved quality and efficiency, challenging the dominance of Masked diffusion.

Contribution

It develops a family of PC samplers applicable to arbitrary noise processes, enhancing sampling quality and efficiency in discrete diffusion models.

Findings

01

PC samplers outperform ancestral sampling on language and image benchmarks.

02

Sampling quality continues to improve with more steps using PC methods.

03

Memory-efficient curriculum reduces training time by 25% and memory by 33%.

Abstract

Uniform-state discrete diffusion models excel at few-step generation and guidance due to their ability to self-correct, making them preferred over autoregressive or Masked diffusion models in these settings. However, their sampling quality plateaus with ancestral samplers as the number of steps increases. We introduce a family of Predictor-Corrector (PC) samplers for discrete diffusion that generalize prior methods and apply to arbitrary noise processes. When paired with uniform-state diffusion, our samplers outperform ancestral sampling on both language and image modeling, achieving lower generative perplexity at matched unigram entropy on OpenWebText and better FID/IS scores on CIFAR10. Crucially, unlike conventional samplers, our PC methods continue to improve with more sampling steps. Taken together, these findings call into question the assumption that Masked diffusion is the…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 8Confidence 4

Strengths

- clear motivation for why sampling needs improvement in discrete diffusion: standard samplers can "over-commit" and cannot "self-correct" without further hacks, sometimes including adversarial training or other approximations. - the psi-sampler formulation is general and recovers previous a few predictor–corrector methods as special cases (though please careful to not say "all cases in the literature", you never know with so many papers coming out daily, I suggest "that the authors are aware o

Weaknesses

- The curriculum section assumes prior familiarity with Sahoo 2025a and gives little intuition for why that weighted-average operation helps training. Since you are devoting nearly a whole section to this method, I ask that you at least give a few sentences on how exactly the "curriculum" technique during training uses this average computation that you are approximating. Otherwise, the speedup could be relegated to an appendix section (I think "curriculum" is used 23 times in main text without b

Reviewer 02Rating 8Confidence 3

Strengths

1. I like Figure 1 which conveys the main results of the paper convincingly. 2. The paper is well written in the sense that the background section is well formulated, the main contributions of the paper are well supported by empirical results. 3. The idea of formulating non-markovian forward processes for discrete diffusion models is quite interesting given its numerous applications in the context of continuous diffusion models in the form of DDIM.

Weaknesses

I dont have a lot of concerns around the proposed method but rather a few suggestions for improving the presentation of the paper. **Presentation Issues** 1. Is there a reason for using the psi notation to denote distributions throughout the paper? We can probably get rid of notations and denote distributions using their standard notations like p(.) or q(.) like other works in the literature. 2. In general, a lot of intuition is missing around the sampler design in Section 3. It is not clear,

Reviewer 03Rating 6Confidence 4

Strengths

The paper derives rigorous predictor-corrector schemes for both masked and uniform state diffusion models, as well as tractable approximations for the uniform state diffusion models.

Weaknesses

See questions.

Code & Models

Repositories

https://s-sahoo.com/duo-ch2
github

Models

🤗
jdeschena/duo2-cifar10
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques