Inference-Time Scaling of Diffusion Language Models via Trajectory Refinement
Meihua Dang, Jiaqi Han, Minkai Xu, Kai Xu, Akash Srivastava, Stefano Ermon

TL;DR
This paper introduces PG-DLM, an inference-time algorithm for diffusion language models that refines entire generation trajectories, improving reward-guided generation efficiency and accuracy without retraining.
Contribution
The paper proposes particle Gibbs sampling for diffusion language models, enabling trajectory-level refinement and adaptive compute allocation during inference.
Findings
PG-DLM outperforms prior methods on reward-guided tasks.
Achieves 90.07% accuracy on GSM8K with 2.9 particles.
Achieves 94.47% accuracy on GSM8K with 16 particles.
Abstract
Discrete diffusion models have recently emerged as strong alternatives to autoregressive language models, matching their performance through large-scale training. However, inference-time control remains relatively underexplored. In this work, we study how to steer generation toward desired rewards without retraining the models. Prior methods typically resample or filter within a single denoising trajectory, optimizing rewards step-by-step without trajectory-level refinement. We introduce particle Gibbs sampling for diffusion language models (PG-DLM), an inference-time algorithm enabling trajectory-level refinement. PG-DLM constructs a Markov chain over full denoising trajectories and applies a conditional sequential Monte Carlo kernel to resample them. By doing so, PG-DLM introduces a new scaling axis, the number of refinement iterations, which is unavailable to prior methods.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
