Steering Pretrained Drafters during Speculative Decoding

Fr\'ed\'eric Berdoz; Peer Rheinboldt; Roger Wattenhofer

arXiv:2511.09844·cs.LG·November 14, 2025

Steering Pretrained Drafters during Speculative Decoding

Fr\'ed\'eric Berdoz, Peer Rheinboldt, Roger Wattenhofer

PDF

Open Access 1 Video

TL;DR

This paper proposes a lightweight dynamic alignment method to improve token acceptance rates in speculative decoding by steering pretrained drafters using verifier information, achieving significant gains with minimal overhead.

Contribution

Introduces a novel steering vector mechanism that enhances pretrained drafters' acceptance rates in speculative decoding, compatible with existing models and architectures.

Findings

01

Boosts accepted tokens by up to 35% under standard sampling.

02

Increases accepted tokens by 22% under greedy sampling.

03

Achieves these improvements with negligible computational overhead.

Abstract

Speculative decoding accelerates language model inference by separating generation into fast drafting and parallel verification. Its main limitation is drafter-verifier misalignment, which limits token acceptance and reduces overall effectiveness. While small drafting heads trained from scratch compensate with speed, they struggle when verification dominates latency or when inputs are out of distribution. In contrast, pretrained drafters, though slower, achieve higher acceptance rates thanks to stronger standalone generation capabilities, making them competitive when drafting latency is negligible relative to verification or communication overhead. In this work, we aim to improve the acceptance rates of pretrained drafters by introducing a lightweight dynamic alignment mechanism: a steering vector computed from the verifier's hidden states and injected into the pretrained drafter.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Steering Pretrained Drafters During Speculative Decoding· underline

Taxonomy

TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning