Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling

Rahul Thomas; Arka Pal

arXiv:2602.16961·cs.IT·February 20, 2026

Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling

Rahul Thomas, Arka Pal

PDF

Open Access 3 Reviews

TL;DR

This paper introduces greedy multi-path block verification (GBV), a novel method that enhances speculative decoding efficiency by optimizing verification algorithms, leading to significant speedups in autoregressive model decoding.

Contribution

It extends block verification to off-path probabilities, formulates a greedy multi-path verification approach, and demonstrates substantial empirical improvements over existing methods.

Findings

01

GBV improves block verification efficiency by over 30%.

02

Decoding walltimes are reduced by over 15%.

03

On Llama-3 70B, GBV outperforms state-of-the-art methods by more than 15%.

Abstract

The goal of $L$ -step speculative decoding is to accelerate autoregressive decoding of a target model by using a cheaper draft model to generate a candidate path of $L$ tokens. Based on a verification algorithm involving target and draft model probabilities, a prefix of the candidate sequence is accepted, and an additional correction token is sampled from a residual distribution to ensure that the final output adheres to the target distribution. While standard speculative decoding uses a verification algorithm which is independent at each token on the path, a recent extension called block verification uses a joint condition involving all sampled on-path probabilities. Block verification (BV) was shown to be optimal over all verification algorithms which use only on-path probabilities, improving on standard speculative decoding. In this work, we first show that block verification is…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

The paper writing is clear. The paper provides detailed theoretical analysis. The paper also presents an efficient method called greedy multi-path block verification.

Weaknesses

The evaluation is limited. First, this paper only select block efficiency and wall time as metrics. For other metrics, such as acceptance rates should also be incorporated. Moreover, this paper only select one model family, such as OPT and only consider relatively small model such as 6.7B. It is necessary to investigate the effectiveness of proposed approach on larger models. Furthermore, this paper does not consider some important factors, e.g., temperature. Different temperature may lead to di

Reviewer 02Rating 4Confidence 3

Strengths

Clear formalization and optimality in single‑path. The LP view cleanly isolates prefix‑matching as the true bottleneck. Theorem 3.3 and Theorem 3.4 together pin down BV as optimal among all valid single‑path algorithms, not just on‑path‑restricted ones. Multi‑path decomposition. The paper’s factorization, randomized path selection to induce a “skewed draft” followed by single‑path BV, is conceptually neat and technically useful for designing approximations. Practical algorithm & evidence. GB

Weaknesses

1. Related‑work coverage needs to be tightened and comparisons made precise. The paper states that tree verification (Hu & Huang, 2024) improves over token‑wise verification but is provably worse than block verification (BV). Sun et al. (2024b) prove BV’s optimality among single‑path, on‑path verification algorithms, but that result does not by itself imply a strict separation from tree verification. If this is the intended positioning, please give a crisp statement of assumptions and regimes

Reviewer 03Rating 4Confidence 4

Strengths

- The information-agnostic LP cleanly characterizes feasible node budgets with a theoretical guarantee. - The appendices provide solid derivations and decomposition lemmas. - This paper is overall well-written and easy to follow.

Weaknesses

- Results only use OPT models and three academic datasets; evaluation on modern LLMs (e.g., Llama-2/3/4 and Qwen2.5/3 families) and diverse tasks (long-context, multilingual, tool-use) would strengthen external validity. - K=4 improves block efficiency but hurts wall-time (batch overhead dominates), which needs a more in-depth analysis of K to provide a deeper underrstanding of the method. - Reproducibility would benefit from open code and configurations to validate GBV’s efficiency in other ha

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Formal Methods in Verification · Advanced Wireless Communication Techniques