$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models
Ahsan Bilal, Muhammad Ahmed Mohsin, Muhammad Umer, Asad Aali, Muhammad Usman Khanzada, Muhammad Usman Rafique, Zihao He, Emily Fox, Dean F. Hougen

TL;DR
This paper introduces $S^3$, a verifier-guided search method that reallocates compute during diffusion model denoising to improve output quality without retraining.
Contribution
The paper proposes $S^3$, a novel stratified search technique that enhances diffusion language model outputs by dynamically resampling trajectories guided by a lightweight verifier.
Findings
$S^3$ improves performance on mathematical reasoning benchmarks.
It achieves significant gains without changing the underlying model or decoding schedule.
The method effectively approximates a reward-tilted sampling distribution.
Abstract
Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without additional training. However, naive best-of- sampling is fundamentally limited because it repeatedly draws from the same base diffusion distribution, whose high-probability regions are often misaligned with high-quality outputs. We propose (Stratified Scaling Search), a classical verifier-guided search method that improves generation by reallocating compute during the denoising process rather than only at the final output stage. At each denoising step, expands multiple candidate trajectories, evaluates them with a lightweight reference-free verifier, and selectively resamples promising candidates while preserving diversity within the search frontier. This procedure effectively approximates a reward-tilted sampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
