When, What, and How: Rethinking Retrieval-Enhanced Speculative Decoding
Min Fang, Zhihui Fu, Qibin Zhao, Jun Wang

TL;DR
ReSpec introduces an adaptive retrieval-enhanced speculative decoding framework that intelligently balances speed and accuracy in large language model inference by using entropy-guided triggers, feedback-driven candidate selection, and source-aware verification.
Contribution
This paper presents ReSpec, a novel adaptive framework that improves retrieval-enhanced speculative decoding by reducing unnecessary retrievals and balancing efficiency with output quality.
Findings
ReSpec outperforms EAGLE-2 and SAM-Decoding by over 33% and 25% in speed.
ReSpec maintains high output quality while significantly accelerating LLM inference.
The framework effectively balances accuracy and efficiency through adaptive decision-making.
Abstract
Speculative decoding (SD) has emerged as an effective technique to accelerate large language model (LLM) inference without compromising output quality. However, the achievable speedup largely depends on the effectiveness of the drafting model. While model-based methods like EAGLE-2 are accurate but costly, retrieval-enhanced methods like SAM-Decoding rely on heuristic switching strategies that often trigger unnecessary retrievals. To address this, we propose ReSpec (\textbf{Re}trieval-enhanced \textbf{Spe}culative Decoding), a novel framework that transforms heuristic drafter switching into adaptive decision-making. ReSpec features three core innovations: 1) An \textbf{entropy-guided adaptive trigger} quantifies contextual predictability to initiate retrieval only when uncertainty is low, avoiding costly low-quality speculations. 2) A \textbf{feedback-driven candidate selection}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Natural Language Processing Techniques
