TAPS: Task Aware Proposal Distributions for Speculative Sampling

Mohamad Zbib; Mohamad Bazzi; Ammar Mohanna; Hasan Abed Al Kader Hammoud; and Bernard Ghanem

arXiv:2603.27027·cs.CL·March 31, 2026

TAPS: Task Aware Proposal Distributions for Speculative Sampling

Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, and Bernard Ghanem

PDF

1 Repo 10 Models 1 Datasets

TL;DR

This paper investigates how the training data of draft models affects the quality of speculative decoding in autoregressive generation, emphasizing the benefits of task-specific training and specialized inference strategies.

Contribution

It demonstrates that task-specific training improves draft model performance on relevant benchmarks and that inference-time combination methods outperform naive averaging.

Findings

01

Task-specific training yields specialization on reasoning and benchmark tasks.

02

Confidence-based routing outperforms entropy-based routing for draft selection.

03

Combining specialized drafters at inference improves decoding acceptance length.

Abstract

Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft training distribution. We study this question with lightweight HASS and EAGLE-2 drafters trained on MathInstruct, ShareGPT, and mixed-data variants, evaluated on MT-Bench, GSM8K, MATH-500, and SVAMP. Measured by acceptance length, task-specific training yields clear specialization: MathInstruct-trained drafts are strongest on reasoning benchmarks, while ShareGPT-trained drafts are strongest on MT-Bench. Mixed-data training improves robustness, but larger mixtures do not dominate across decoding temperatures. We also study how to combine specialized drafters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

moe-zbeeb/TAPS
github

Models

Datasets

zbeeb/TAPS-Datasets
dataset· 123 dl
123 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.