$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Ahsan Bilal; Muhammad Ahmed Mohsin; Muhammad Umer; Asad Aali; Muhammad Usman Khanzada; Muhammad Usman Rafique; Zihao He; Emily Fox; Dean F. Hougen

arXiv:2604.06260·cs.LG·April 9, 2026

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

Ahsan Bilal, Muhammad Ahmed Mohsin, Muhammad Umer, Asad Aali, Muhammad Usman Khanzada, Muhammad Usman Rafique, Zihao He, Emily Fox, Dean F. Hougen

PDF

TL;DR

This paper introduces $S^3$, a verifier-guided search method that reallocates compute during diffusion model denoising to improve output quality without retraining.

Contribution

The paper proposes $S^3$, a novel stratified search technique that enhances diffusion language model outputs by dynamically resampling trajectories guided by a lightweight verifier.

Findings

01

$S^3$ improves performance on mathematical reasoning benchmarks.

02

It achieves significant gains without changing the underlying model or decoding schedule.

03

The method effectively approximates a reward-tilted sampling distribution.

Abstract

Test-time scaling investigates whether a fixed diffusion language model (DLM) can generate better outputs when given more inference compute, without additional training. However, naive best-of- $K$ sampling is fundamentally limited because it repeatedly draws from the same base diffusion distribution, whose high-probability regions are often misaligned with high-quality outputs. We propose $S^{3}$ (Stratified Scaling Search), a classical verifier-guided search method that improves generation by reallocating compute during the denoising process rather than only at the final output stage. At each denoising step, $S^{3}$ expands multiple candidate trajectories, evaluates them with a lightweight reference-free verifier, and selectively resamples promising candidates while preserving diversity within the search frontier. This procedure effectively approximates a reward-tilted sampling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.