TapOut: A Bandit-Based Approach to Dynamic Speculative Decoding
Aditya Sridhar, Nish Sinnadurai, Sean Lie, Vithursan Thangarasa

TL;DR
TapOut introduces a bandit-based, hyperparameter-free method for dynamically optimizing speculative decoding in large language models, significantly improving speedups without extensive tuning.
Contribution
It presents a novel online bandit algorithm for dynamic speculation policy selection that is training-free and adaptable across models and datasets.
Findings
Achieves superior speedups compared to existing methods
Works effectively across diverse model pairs and datasets
Requires no hyperparameter tuning
Abstract
Speculative decoding accelerates LLMs by using a lightweight draft model to generate tokens autoregressively before verifying them in parallel with a larger target model. However, determining the optimal number of tokens to draft remains a key challenge limiting the approach's effectiveness. Dynamic speculative decoding aims to intelligently decide how many tokens to draft to achieve maximum speedups. Existing methods often rely on hand-tuned, sensitive thresholds (e.g., token entropy), which are costly to set and generalize poorly across models and domains. We propose TapOut, an online, training-free, plug-and-play algorithm for dynamic speculation policy selection using multi-armed bandits. Our approach employs a meta-algorithm that selects among multiple parameter-free dynamic speculation strategies based on past reward and exploration. We conduct extensive experiments across diverse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
