Efficient Adaptive Rejection Sampling for Accelerating Speculative Decoding in Large Language Models
Chendong Sun, Ali Mao, Lei Xu, mingmin Chen

TL;DR
This paper presents EARS, an adaptive rejection sampling method that improves speculative decoding efficiency in large language models by dynamically adjusting acceptance thresholds based on model uncertainty, leading to significant throughput gains.
Contribution
EARS introduces a novel adaptive threshold mechanism for rejection sampling in speculative decoding, reducing random rejections without altering model architectures.
Findings
Achieves up to 18.12% throughput increase in inference.
Maintains high accuracy with only 0.84% drop on GSM8K.
Seamlessly integrates into existing frameworks.
Abstract
Speculative Decoding is a prominent technique for accelerating the autoregressive inference of large language models (LLMs) by employing a fast draft model to propose candidate token sequences and a large target model to verify them in parallel. However, its core component -- the rejection sampling mechanism -- relies on a fixed, context-independent random threshold. This leads to a significant "random rejection" problem in high-uncertainty generation scenarios, where plausible candidate tokens are frequently rejected due to random chance, undermining inference efficiency. This paper introduces Efficient Adaptive Rejection Sampling (EARS), a novel method that dynamically adjusts the acceptance threshold by incorporating the target model's own predictive uncertainty, measured as 1 - max(P_target). By introducing a tolerance term proportional to this uncertainty, EARS intelligently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
