TAPS: Task Aware Proposal Distributions for Speculative Sampling
Mohamad Zbib, Mohamad Bazzi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, and Bernard Ghanem

TL;DR
This paper investigates how the training data of draft models affects the quality of speculative decoding in autoregressive generation, emphasizing the benefits of task-specific training and specialized inference strategies.
Contribution
It demonstrates that task-specific training improves draft model performance on relevant benchmarks and that inference-time combination methods outperform naive averaging.
Findings
Task-specific training yields specialization on reasoning and benchmark tasks.
Confidence-based routing outperforms entropy-based routing for draft selection.
Combining specialized drafters at inference improves decoding acceptance length.
Abstract
Speculative decoding accelerates autoregressive generation by letting a lightweight draft model propose future tokens that a larger target model then verifies in parallel. In practice, however, draft models are usually trained on broad generic corpora, which leaves it unclear how much speculative decoding quality depends on the draft training distribution. We study this question with lightweight HASS and EAGLE-2 drafters trained on MathInstruct, ShareGPT, and mixed-data variants, evaluated on MT-Bench, GSM8K, MATH-500, and SVAMP. Measured by acceptance length, task-specific training yields clear specialization: MathInstruct-trained drafts are strongest on reasoning benchmarks, while ShareGPT-trained drafts are strongest on MT-Bench. Mixed-data training improves robustness, but larger mixtures do not dominate across decoding temperatures. We also study how to combine specialized drafters…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗zbeeb/Hass-MathInstruct_20epochsmodel
- 🤗zbeeb/Hass-ShareGPT_20epochsmodel· 1 dl1 dl
- 🤗zbeeb/Hass-Sharegpt-Mathinstruct-20epochsmodel· 7 dl7 dl
- 🤗zbeeb/Hass-Averaged-Checkpointmodel· 1 dl1 dl
- 🤗zbeeb/Eagle-MathInstruct_20epochsmodel· 8 dl8 dl
- 🤗zbeeb/Eagle-Sharegpt-Mathinstruct-20epochsmodel· 6 dl6 dl
- 🤗zbeeb/Eagle-ShareGPT_20epochsmodel· 8 dl8 dl
- 🤗zbeeb/Eagle-Averaged-Checkpointmodel· 3 dl· ♡ 13 dl♡ 1
- 🤗zbeeb/Eagle-Sharegpt-Mathinstruct-20epochs-140kmodel· 6 dl6 dl
- 🤗zbeeb/Hass-Sharegpt-Mathinstruct-20epochs-140kmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
