TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding

Aditya Sridhar; Nish Sinnadurai; Sean Lie; Vithursan Thangarasa

arXiv:2511.02017·cs.LG·November 5, 2025

TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding

Aditya Sridhar, Nish Sinnadurai, Sean Lie, Vithursan Thangarasa

PDF

Open Access

TL;DR

TapOut introduces a bandit-based, hyperparameter-free method for dynamically optimizing speculative decoding in large language models, significantly improving speedups without extensive tuning.

Contribution

It presents a novel online bandit algorithm for dynamic speculation policy selection that is training-free and adaptable across models and datasets.

Findings

01

Achieves superior speedups compared to existing methods

02

Works effectively across diverse model pairs and datasets

03

Requires no hyperparameter tuning

Abstract

Speculative decoding accelerates LLMs by using a lightweight draft model to generate tokens autoregressively before verifying them in parallel with a larger target model. However, determining the optimal number of tokens to draft remains a key challenge limiting the approach's effectiveness. Dynamic speculative decoding aims to intelligently decide how many tokens to draft to achieve maximum speedups. Existing methods often rely on hand-tuned, sensitive thresholds (e.g., token entropy), which are costly to set and generalize poorly across models and domains. We propose TapOut, an online, training-free, plug-and-play algorithm for dynamic speculation policy selection using multi-armed bandits. Our approach employs a meta-algorithm that selects among multiple parameter-free dynamic speculation strategies based on past reward and exploration. We conduct extensive experiments across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis