GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu, Jingyang Li, Xingyu Xie, Zhihui Lu, Kim-Chuan Toh, Pan Zhou

TL;DR
GRIFFIN introduces a token alignment strategy for speculative decoding in large language models, significantly improving inference speed and draft token acceptance rates by addressing token misalignment issues.
Contribution
It proposes a novel token-alignable training and draft model framework that reduces misalignment, enhancing the efficiency of speculative decoding in LLMs.
Findings
Over 8% increase in acceptance length
More than 7% speedup ratio
Outperforms existing speculative decoding methods
Abstract
Speculative decoding accelerates inference in large language models (LLMs) by generating multiple draft tokens simultaneously. However, existing methods often struggle with token misalignment between the training and decoding phases, limiting their performance. To address this, we propose GRIFFIN, a novel framework that incorporates a token-alignable training strategy and a token-alignable draft model to mitigate misalignment. The training strategy employs a loss masking mechanism to exclude highly misaligned tokens during training, preventing them from negatively impacting the draft model's optimization. The token-alignable draft model introduces input tokens to correct inconsistencies in generated features. Experiments on LLaMA, Vicuna, Qwen and Mixtral models demonstrate that GRIFFIN achieves an average acceptance length improvement of over 8% and a speedup ratio exceeding 7%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗husj576/GRIFFIN-llama3-instruct-8Bmodel· 19 dl· ♡ 119 dl♡ 1
- 🤗husj576/GRIFFIN-llama2-chat-7Bmodel· 20 dl20 dl
- 🤗husj576/GRIFFIN-llama2-chat-13Bmodel· 28 dl28 dl
- 🤗husj576/GRIFFIN-Vicuna-7B-v1.5model· 21 dl21 dl
- 🤗husj576/GRIFFIN-llama3-instruct-70Bmodel· 16 dl16 dl
- 🤗husj576/GRIFFIN-qwen2-instruct-7Bmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Advanced Malware Detection Techniques
