Confidence-Modulated Speculative Decoding for Large Language Models

Jaydip Sen; Subhasis Dasgupta; Hetvi Waghela

arXiv:2508.15371·cs.CL·August 26, 2025

Confidence-Modulated Speculative Decoding for Large Language Models

Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela

PDF

TL;DR

This paper introduces a confidence-modulated speculative decoding method for large language models that adaptively adjusts drafting and verification based on uncertainty, leading to faster inference without quality loss.

Contribution

It presents an information-theoretic framework that dynamically modulates speculative decoding using entropy and margin-based uncertainty measures, improving efficiency and robustness.

Findings

01

Achieves significant speedups over standard speculative decoding.

02

Maintains or improves BLEU and ROUGE scores in experiments.

03

Reduces rollback frequency and enhances resource utilization.

Abstract

Speculative decoding has emerged as an effective approach for accelerating autoregressive inference by parallelizing token generation through a draft-then-verify paradigm. However, existing methods rely on static drafting lengths and rigid verification criteria, limiting their adaptability across varying model uncertainties and input complexities. This paper proposes an information-theoretic framework for speculative decoding based on confidence-modulated drafting. By leveraging entropy and margin-based uncertainty measures over the drafter's output distribution, the proposed method dynamically adjusts the number of speculatively generated tokens at each iteration. This adaptive mechanism reduces rollback frequency, improves resource utilization, and maintains output fidelity. Additionally, the verification process is modulated using the same confidence signals, enabling more flexible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.