BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Yunlong Hou; Fengzhuo Zhang; Cunxiao Du; Xuan Zhang; Jiachun Pan; Tianyu Pang; Chao Du; Vincent Y. F. Tan; Zhuoran Yang

arXiv:2505.15141·cs.LG·November 21, 2025

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan, Zhuoran Yang

PDF

Open Access 1 Video

TL;DR

BanditSpec introduces a training-free, adaptive hyperparameter tuning method for speculative decoding in large language models, utilizing bandit algorithms to optimize performance dynamically during inference.

Contribution

It formulates hyperparameter selection as a multi-armed bandit problem and develops novel algorithms with theoretical regret bounds, enhancing decoding efficiency without additional training.

Findings

01

Effective hyperparameter adaptation with bandit algorithms

02

Theoretical regret bounds established for proposed methods

03

Empirical results show near-oracle performance in LLM inference

Abstract

Speculative decoding has emerged as a popular method to accelerate the inference of Large Language Models (LLMs) while retaining their superior text generation performance. Previous methods either adopt a fixed speculative decoding configuration regardless of the prefix tokens, or train draft models in an offline or online manner to align them with the context. This paper proposes a training-free online learning framework to adaptively choose the configuration of the hyperparameters for speculative decoding as text is being generated. We first formulate this hyperparameter selection problem as a Multi-Armed Bandit problem and provide a general speculative decoding framework BanditSpec. Furthermore, two bandit-based hyperparameter selection algorithms, UCBSpec and EXP3Spec, are designed and analyzed in terms of a novel quantity, the stopping time regret. We upper bound this regret under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsADaptive gradient method with the OPTimal convergence rate · ALIGN