Multi-Drafter Speculative Decoding with Alignment Feedback

Taehyeon Kim; Hojung Jung; Se-Young Yun

arXiv:2604.05417·cs.CL·April 8, 2026

Multi-Drafter Speculative Decoding with Alignment Feedback

Taehyeon Kim, Hojung Jung, Se-Young Yun

PDF

TL;DR

MetaSD enhances speculative decoding for large language models by dynamically integrating multiple drafters using alignment feedback, leading to improved inference speed and quality across diverse tasks.

Contribution

The paper introduces MetaSD, a unified framework that combines multiple drafters with a multi-armed bandit approach for more effective speculative decoding.

Findings

01

MetaSD outperforms single-drafter methods in experiments.

02

Dynamic resource allocation improves decoding efficiency.

03

Alignment feedback guides effective drafter selection.

Abstract

Speculative decoding (SD) accelerates large language model (LLM) inference by using a smaller model to draft future tokens, which are then verified by the target LLM. This preserves generation quality by accepting only aligned tokens. However, individual drafters, often trained for specific tasks or domains, exhibit limited effectiveness across diverse applications. To address this, we introduce \textsc{MetaSD}, a unified framework that integrates multiple drafters into the SD process. MetaSD dynamically allocates computational resources to heterogeneous drafters by leveraging alignment feedback and framing drafter selection as a multi-armed bandit problem. Extensive experiments show MetaSD consistently outperforms single-drafter approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.