Confidence-Modulated Speculative Decoding for Large Language Models
Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela

TL;DR
This paper introduces a confidence-modulated speculative decoding method for large language models that adaptively adjusts drafting and verification based on uncertainty, leading to faster inference without quality loss.
Contribution
It presents an information-theoretic framework that dynamically modulates speculative decoding using entropy and margin-based uncertainty measures, improving efficiency and robustness.
Findings
Achieves significant speedups over standard speculative decoding.
Maintains or improves BLEU and ROUGE scores in experiments.
Reduces rollback frequency and enhances resource utilization.
Abstract
Speculative decoding has emerged as an effective approach for accelerating autoregressive inference by parallelizing token generation through a draft-then-verify paradigm. However, existing methods rely on static drafting lengths and rigid verification criteria, limiting their adaptability across varying model uncertainties and input complexities. This paper proposes an information-theoretic framework for speculative decoding based on confidence-modulated drafting. By leveraging entropy and margin-based uncertainty measures over the drafter's output distribution, the proposed method dynamically adjusts the number of speculatively generated tokens at each iteration. This adaptive mechanism reduces rollback frequency, improves resource utilization, and maintains output fidelity. Additionally, the verification process is modulated using the same confidence signals, enabling more flexible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
