Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization

Rahul Krishna Thomas; Arka Pal

arXiv:2511.15898·cs.LG·November 21, 2025

Global Resolution: Optimal Multi-Draft Speculative Sampling via Convex Minimization

Rahul Krishna Thomas, Arka Pal

PDF

Open Access

TL;DR

This paper introduces a convex optimization approach to multi-draft speculative sampling for large language models, achieving high acceptance rates and low latency by efficiently solving the optimal transport problem.

Contribution

It reformulates the complex optimal transport problem into a convex optimization problem using polymatroid theory, enabling practical multi-draft sampling with high acceptance and efficiency.

Findings

01

Achieves 90% acceptance rate in multi-draft sampling.

02

Reduces overhead to under 100 ms per token.

03

Provides a scalable algorithm for optimal n-draft sampling.

Abstract

Speculative sampling reduces the latency of autoregressive decoding for target model LLMs without sacrificing inference quality, by using a cheap draft model to suggest a candidate token and a verification criterion to accept or resample this token. To improve acceptance and decoding efficiency, recent work has explored the multi-draft extension, where at each step $n$ draft tokens are generated, and the verification criterion is a distribution conditioned on these. When this criterion maximizes the probability of accepting some draft token, it is called the optimal transport (OT). However, finding the OT is difficult, as it is the solution of a linear program (OTLP) in over $V^{n}$ variables, with $V$ being the vocabulary size. Two recent theoretical works have reframed the OTLP in terms of importance sampling or subset selection. In this work, we prove that these formulations are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Adversarial Robustness in Machine Learning