Inference-Time Scaling of Discrete Diffusion Models via Importance Weighting and Optimal Proposal Design
Zijing Ou, Chinmay Pani, Yingzhen Li

TL;DR
This paper introduces an SMC-based framework for inference-time control of discrete diffusion models, improving their scalability, controllability, and sample quality across diverse applications.
Contribution
It develops a principled importance weighting and proposal design method for scalable inference in discrete diffusion models, with practical approximations and broad empirical validation.
Findings
Enhanced controllability in discrete diffusion models
Improved sample quality demonstrated across tasks
Effective inference-time scaling via SMC methods
Abstract
Discrete diffusion models have become highly effective across various domains. However, real-world applications often require the generative process to adhere to certain constraints. To this end, we propose a Sequential Monte Carlo (SMC) framework that enables scalable inference-time control of discrete diffusion models through principled importance weighting and optimal proposal construction. Specifically, our approach derives tractable importance weights for a range of intermediate targets and characterises the optimal proposal, for which we develop two practical approximations: a first-order gradient-based approximation and an amortised proposal trained to minimise the log-variance of the importance weights. Empirical results across synthetic tasks, language modelling, biology design, and text-to-image generation demonstrate that our framework enhances controllability and sample…
Peer Reviews
Decision·ICLR 2026 Poster
The reviewer finds that the paper is clearly written and explained.
The reviewer thinks that this paper does have quite a few weaknesses, which are listed below: (1) Firstly, it seems that the novelty of the SMC-based inference-time scaling framework seems to be quite limited. Specifically, the distribution path (with respect to time time variable $t$) is the same as that of [1], even though the authors did provide a way to justify how the discrete-time formulation proposed in this paper converges to [1] under the continuum limit. Moreover, the idea of variance
The paper is well-written and clearly organized, with rigorous mathematical derivations, although some of the notations can be further improved for better clarity. The SMC framework for MDMs is novel and well-motivated. The experiments are also comprehensive, covering various tasks from synthetic toy examples to large scale language, DNA and image generation tasks, and the results demonstrate the effectiveness of the proposed methods. I appreciate the extensive ablation studies and analyses prov
I don't see any significant weakness, but one way this paper can be further improved is through including inference-time scaling baselines to compare with, which are not presented in the main text. I checked the appendix and found some comparisons with other fine-tuning methods such as DRAKES for the DNA task, but there is no comparison with purely inference-time control methods (e.g., SVDD, arxiv:2408.08252). The authors are encouraged to include such comparisons and report the results, but I u
The paper develops two different proposals for SMC for discrete diffusion models. Both proposals are rigorously motivated and are practical to implement, with the empirical results showing the efficacy of both approaches.
While the underlying results have been known (and cited) in the context of designing proposals for MCMC and discrete generative (Grathwohl et al. (2021); Zhang et al. (2022)). Their use in discrete diffusion is well motivated and shown to produce significant improvements.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Model Reduction and Neural Networks
