Sampling for Quality: Training-Free Reward-Guided LLM Decoding via Sequential Monte Carlo
Jelena Markovic-Voronov, Wenhui Zhu, Bo Long, Zhipeng Wang, Suyash Gupta, Kayhan Behdin, Bee-Chung Chen, Deepak Agarwal

TL;DR
This paper presents a training-free, reward-guided decoding framework for large language models using Sequential Monte Carlo methods, improving sequence-level quality without retraining models.
Contribution
It introduces a novel inference-time sampling approach that optimizes sequence-level rewards, surpassing existing decoding strategies and reinforcement learning methods.
Findings
Significant performance improvements on code generation and mathematical reasoning tasks.
Up to 54.9% performance gain on HumanEval with 7B models.
Outperforms reinforcement learning method GRPO across benchmarks.
Abstract
We introduce a principled probabilistic framework for reward-guided decoding in large language models, addressing the limitations of standard decoding methods that optimize token-level likelihood rather than sequence-level quality. Our method defines a reward-augmented target distribution over complete sequences by combining model transition probabilities with prefix-dependent reward potentials. Importantly, the approach is training-free: it leaves model weights unchanged and instead modifies the inference distribution via reward potentials, with all gains arising purely from inference-time sampling. To sample from this distribution, we develop Sequential Monte Carlo algorithms, including a computationally efficient prefix-only variant and a lookahead variant whose intermediate targets match the exact marginals of the full sequence distribution. The framework also integrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
