Multi-Drafter Speculative Decoding with Alignment Feedback
Taehyeon Kim, Hojung Jung, Se-Young Yun

TL;DR
MetaSD enhances speculative decoding for large language models by dynamically integrating multiple drafters using alignment feedback, leading to improved inference speed and quality across diverse tasks.
Contribution
The paper introduces MetaSD, a unified framework that combines multiple drafters with a multi-armed bandit approach for more effective speculative decoding.
Findings
MetaSD outperforms single-drafter methods in experiments.
Dynamic resource allocation improves decoding efficiency.
Alignment feedback guides effective drafter selection.
Abstract
Speculative decoding (SD) accelerates large language model (LLM) inference by using a smaller model to draft future tokens, which are then verified by the target LLM. This preserves generation quality by accepting only aligned tokens. However, individual drafters, often trained for specific tasks or domains, exhibit limited effectiveness across diverse applications. To address this, we introduce \textsc{MetaSD}, a unified framework that integrates multiple drafters into the SD process. MetaSD dynamically allocates computational resources to heterogeneous drafters by leveraging alignment feedback and framing drafter selection as a multi-armed bandit problem. Extensive experiments show MetaSD consistently outperforms single-drafter approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
