Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu, Yehui Tang, Zhenhua Liu, Yunsheng Ni, Kai Han, Yunhe, Wang

TL;DR
Kangaroo introduces a self-speculative decoding framework that uses a shallow sub-network with early exiting to accelerate large language model inference efficiently, reducing parameters and maintaining accuracy.
Contribution
The paper proposes a novel self-speculative decoding method with a fixed shallow sub-network and early exiting, improving speed and efficiency over traditional draft models.
Findings
Achieves up to 1.68x speedup on Spec-Bench
Uses 88.7% fewer parameters than Medusa-1
Effective in maintaining sampling distribution fidelity
Abstract
Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{Kangaroo}, which uses a fixed shallow sub-network as a self-draft model, with the remaining layers serving as the larger target model. We train a lightweight and efficient adapter module on top of the sub-network to bridge the gap between the sub-network and the full model's representation ability. It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression
MethodsEarly exiting using confidence measures · Adapter
