Acceptance Dynamics Across Cognitive Domains in Speculative Decoding
Saif Mahmoud

TL;DR
This empirical study investigates how task type influences acceptance probability in speculative decoding for large language models across four NLP domains, revealing task type as a key predictor of acceptance.
Contribution
It provides the first detailed analysis of acceptance dynamics across different NLP tasks, highlighting the impact of task type over tree depth in speculative decoding.
Findings
Task type is a stronger predictor of acceptance than tree depth.
Chat domain yields higher accepted length than other domains.
Entropy-acceptance correlation is weakly negative across domains.
Abstract
Speculative decoding accelerates large language model (LLM) inference. It uses a small draft model to propose a tree of future tokens. A larger target model then verifies these tokens in a single batched forward pass. Despite the growing body of work on speculative methods, the degree to which the cognitive characteristics of a task affect acceptance probability remains largely unexplored. We present an empirical study of tree-based speculative decoding acceptance dynamics. Our study spans four well-established NLP benchmark domains: code generation, mathematical reasoning, logical reasoning, and open-ended chat. For this, we use TinyLlama-1.1B as the draft model against Llama-2-7B-Chat-GPTQ as the target. Over 99,768 speculative nodes collected from 200 prompts, we derive per-domain acceptance rates, expected accepted lengths, depth-acceptance profiles, and entropy-acceptance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
