Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Jonathan Geuter; Youssef Mroueh; David Alvarez-Melis

arXiv:2506.04118·cs.LG·April 28, 2026

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Jonathan Geuter, Youssef Mroueh, David Alvarez-Melis

PDF

1 Repo 1 Video

TL;DR

Guided Speculative Inference (GSI) is a new algorithm that improves the efficiency and accuracy of reward-guided decoding in large language models, reducing latency and outperforming existing methods.

Contribution

GSI combines test-time scaling with a small auxiliary model to approximate optimal reward-guided policies, enhancing decoding efficiency and accuracy.

Findings

01

GSI achieves higher accuracy than standard soft best-of-$n$ methods.

02

GSI reduces end-to-end latency by up to 28%.

03

GSI outperforms some existing reward-guided decoding approaches.

Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of- $n$ test-time scaling with a reward model $r (x, y)$ and speculative samples from a small auxiliary model $π_{S} (y ∣ x)$ . We provably approximate both the optimal tilted policy $π_{β, B} (y ∣ x) \propto π_{B} (y ∣ x) exp (β r (x, y))$ of soft best-of- $n$ under the base model $π_{B}$ , as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K) and across different model families, our method achieves higher accuracy than standard soft best-of- $n$ with $π_{S}$ and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of- $n$ with $π_{B}$ , while reducing end-to-end latency by up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

j-geuter/GSI
github

Videos

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs· slideslive