Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models
Ziwei Luo, Ziqi Jin, Lei Wang, Lidong Bing, Thomas B. Sch\"on

TL;DR
This paper introduces a self-rewarding sequential Monte Carlo method for masked diffusion language models, enhancing sampling diversity and quality by parallel trajectory exploration and confidence-based resampling without additional training.
Contribution
It proposes a novel self-rewarding SMC algorithm that improves sampling in MDLMs by using trajectory-level confidence for importance weighting and resampling.
Findings
Significant improvement in sampling quality on various benchmarks.
Enhanced diversity in generated samples without extra training.
Effective conversion of parallel inference into higher-quality outputs.
Abstract
This work presents self-rewarding sequential Monte Carlo (SMC), an inference-time scaling algorithm enabling effective sampling of masked diffusion language models (MDLMs). Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy, where only tokens with the highest prediction confidence are preserved at each step. This restricts the generation to a noise-sensitive, greedy decoding paradigm, resulting in an inevitable collapse in the diversity of possible paths. We address this problem by launching multiple interacting diffusion processes in parallel, referred to as particles, for trajectory exploration. Importantly, we introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights. During sampling, particles are iteratively weighted and resampled to systematically steer generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
