SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu, Jianian Zhu, Yinghui Li, Haojie Wang, Biao Hou, Jidong Zhai

TL;DR
SpecRouter introduces an adaptive, multi-level speculative decoding framework for large language models, dynamically optimizing inference paths based on real-time feedback to balance quality and latency.
Contribution
It proposes a novel adaptive routing approach with multi-level verification and synchronized state management to improve LLM inference efficiency.
Findings
Reduces inference latency compared to static methods.
Effectively balances quality and speed through dynamic model chaining.
Demonstrates promising preliminary experimental results.
Abstract
Large Language Models (LLMs) present a critical trade-off between inference quality and computational cost: larger models offer superior capabilities but incur significant latency, while smaller models are faster but less powerful. Existing serving strategies often employ fixed model scales or static two-stage speculative decoding, failing to dynamically adapt to the varying complexities of user requests or fluctuations in system performance. This paper introduces \systemname{}, a novel framework that reimagines LLM inference as an adaptive routing problem solved through multi-level speculative decoding. \systemname{} dynamically constructs and optimizes inference "paths" (chains of models) based on real-time feedback, addressing the limitations of static approaches. Our contributions are threefold: (1) An \textbf{adaptive model chain scheduling} mechanism that leverages performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Natural Language Processing Techniques · Software System Performance and Reliability
