AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Yuezhou Hu; Jiaxin Guo; Xinyu Feng; Tuo Zhao

arXiv:2510.19779·cs.CL·October 23, 2025

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Yuezhou Hu, Jiaxin Guo, Xinyu Feng, Tuo Zhao

PDF

Open Access 1 Video

TL;DR

AdaSPEC introduces a selective knowledge distillation method that filters difficult tokens to improve the alignment of draft and target models, significantly enhancing speculative decoding efficiency across various tasks.

Contribution

AdaSPEC proposes a novel token filtering approach in knowledge distillation to better align draft models with target models for more efficient speculative decoding.

Findings

01

Achieves up to 15% higher token acceptance rates.

02

Outperforms state-of-the-art DistillSpec across multiple tasks.

03

Effective across models of various sizes.

Abstract

Speculative Decoding (SD) accelerates large language model inference by employing a small draft model to generate predictions, which are then verified by a larger target model. The effectiveness of SD hinges on the alignment between these models, which is typically enhanced by Knowledge Distillation (KD). However, conventional KD methods aim to minimize the KL divergence between the draft and target models across all tokens, a goal that is misaligned with the true objective of SD, which is to maximize token acceptance rate. Therefore, draft models often struggle to fully assimilate the target model's knowledge due to capacity constraints, leading to suboptimal performance. To address this challenge, we propose AdaSPEC, a novel method that incorporates selective token filtering into the KD process. AdaSPEC utilizes a reference model to identify and filter out difficult-to-fit tokens,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Computational and Text Analysis Methods