Effective Test-Time Scaling of Discrete Diffusion through Iterative Refinement
Sanghyun Lee, Sunwoo Kim, Seungryong Kim, Jongho Park, Dongmin Park

TL;DR
This paper introduces IterRef, a novel test-time scaling method for discrete diffusion models that iteratively refines states guided by rewards, significantly improving generation quality especially under low compute budgets.
Contribution
We propose IterRef, a new reward-guided refinement technique for discrete diffusion models that explicitly refines states during inference, formalized within a MTM framework.
Findings
IterRef improves reward-guided generation quality across text and image domains.
It achieves significant gains under low compute budgets.
Outperforms prior state-of-the-art baselines.
Abstract
Test-time scaling through reward-guided generation remains largely unexplored for discrete diffusion models despite its potential as a promising alternative. In this work, we introduce Iterative Reward-Guided Refinement (IterRef), a novel test-time scaling method tailored to discrete diffusion that leverages reward-guided noising-denoising transitions to progressively refine misaligned intermediate states. We formalize this process within a Multiple-Try Metropolis (MTM) framework, proving convergence to the reward-aligned distribution. Unlike prior methods that assume the current state is already aligned with the reward distribution and only guide the subsequent transition, our approach explicitly refines each state in situ, progressively steering it toward the optimal intermediate distribution. Across both text and image domains, we evaluate IterRef on diverse discrete diffusion models…
Peer Reviews
Decision·Submitted to ICLR 2026
* Introduces a principled, MCMC-based (Multiple-Try Metropolis) test-time refinement specifically for discrete diffusion — a gap in prior work. * Provides a convergence guarantee toward a reward-aligned distribution, not just a heuristic. * Performance: Consistent, large empirical gains (text + image), especially under low compute budgets (up to 8× efficiency). * Flexibility: Supports selective refinement timesteps and adjustable iteration/candidate trade-offs. * Insight: Reveals that late denoi
Theory–practice gap: The practical MTM variant simplifies away exact detailed balance; convergence guarantees may not strictly hold. Limited evaluation metrics: Focuses mainly on reward scores (toxicity, CLIPScore) without checking fluency/diversity side effects. Baseline fairness: Competing methods may not be fully re-tuned for the discrete setting. Compute realism: “Equal NFEs” ignores real wall-clock cost differences between reward and generative models. Missing failure analysis: No qualitati
- Clear motivation for iteratively refining partially unmasked sequences to correct for errors before transitioning to the next denoised state, which has backing from previous work [1] - Well-defined and actionable knobs (refinement steps $k$, number of proposals $N$, and timestep set $U$) with ablation studies studying the effect of each of these parameters on downstream performance. - The paper discusses some modifications to reduce the computational cost of IterRe,f which can help when using
My main concerns are with the significant changes between IterRef and MTM that require re-evaluating the theoretical result, some improvements to the experimental setup (fair calculation of NFEs, wall-clock time comparison, better metrics, using multiple seeds, and one important baseline), and clarifications on certain statements in the paper. ### Theoretical guarantee and design choices of MTM The proposed algorithm differs from standard MTM in several important ways, and these choices are no
The overall idea is similar to the predictor-corrector framework. While this concept has been extensively explored in recent years, the use of the Multiple-Try Metropolis method as the corrector is novel, making the contribution distinctive. Moreover, the empirical results are promising, showing significant improvements over guidance-, SMC- and BoN-based methods, etc.
My main concern lies in the clarity and coherence of the paper’s narrative. The presentation of the proposed method and its connection to the underlying theory is difficult to follow. Key algorithmic details, such as the roles of the balancing function, importance weights, and acceptance rate, are either missing or insufficiently explained (see questions below), making it challenging to fully understand how the sampling procedure operates. As a result, while the empirical results appear promisin
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
