Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Hongyi Liu; Jiaji Huang; Zhen Jia; Youngsuk Park; Yu-Xiang Wang

arXiv:2510.20064·cs.LG·April 24, 2026

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs

Hongyi Liu, Jiaji Huang, Zhen Jia, Youngsuk Park, Yu-Xiang Wang

PDF

1 Models 1 Video

TL;DR

This paper introduces a provably no-regret online model selection algorithm for speculative decoding in large language models, improving efficiency and accuracy over existing bandit-based methods across various datasets.

Contribution

It presents a novel algorithm that accurately evaluates all draft models without extra queries, outperforming bandit-based approaches and reducing computational overhead.

Findings

01

Our method outperforms EAGLE3 and BanditSpec baselines in diverse datasets.

02

The approach is applicable to various speculative decoding methods.

03

Experimental results show significant improvements in long reasoning tasks.

Abstract

Speculative decoding is widely used in accelerating large language model (LLM) inference. In this work, we focus on the online draft model selection problem in speculative decoding. We design an algorithm that provably competes with the best draft model in hindsight for each query in terms of either the token acceptance probability or expected acceptance length. In particular, we show that we can accurately evaluate all draft models, instead of only the chosen model without incurring additional queries to the target model, which allows us to improve exponentially over the existing bandit-based approach as the number of draft models increases. Our approach is generically applicable with any speculative decoding methods (single draft, multi-drafts and draft-trees). Moreover, we design system-efficient versions of online learners and demonstrate that the overhead in computation and latency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
aladinggit/hedgespec_eagle_drafters
model· ♡ 1
♡ 1

Videos

Not-a-Bandit: Provably No-Regret Drafter Selection in Speculative Decoding for LLMs· slideslive