EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

TL;DR
EAGLE-2 enhances speculative sampling for large language models by introducing a context-aware dynamic draft tree, significantly speeding up inference while maintaining output quality.
Contribution
It proposes a novel context-aware dynamic draft tree technique for speculative sampling, improving inference speed over previous static methods.
Findings
Achieves 3.05x-4.26x speedup ratios
20%-40% faster than EAGLE-1
Maintains unchanged output distribution
Abstract
Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution. Most speculative sampling methods such as EAGLE use a static draft tree, implicitly assuming that the acceptance rate of draft tokens depends only on their position. Interestingly, we found that the acceptance rate of draft tokens is also context-dependent. In this paper, building upon EAGLE, we propose EAGLE-2, which introduces a new technique of context-aware dynamic draft tree into drafting modeling. This improvement leverages the fact that the draft model of EAGLE is well-calibrated: the confidence scores from the draft model approximate acceptance rates with small errors. We conducted extensive evaluations on three series of LLMs and six tasks, with EAGLE-2 achieving speedup ratios 3.05x-4.26x, which is 20%-40% faster than EAGLE-1.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yuhuili/EAGLE-Vicuna-7B-v1.3model· 507 dl· ♡ 3507 dl♡ 3
- 🤗yuhuili/EAGLE-Vicuna-33B-v1.3model· 10 dl10 dl
- 🤗yuhuili/EAGLE-Vicuna-13B-v1.3model· 200 dl200 dl
- 🤗yuhuili/EAGLE-llama2-chat-7Bmodel· 488 dl· ♡ 5488 dl♡ 5
- 🤗yuhuili/EAGLE-llama2-chat-13Bmodel· 41 dl41 dl
- 🤗yuhuili/EAGLE-llama2-chat-70Bmodel· 20 dl· ♡ 120 dl♡ 1
- 🤗yuhuili/EAGLE-mixtral-instruct-8x7Bmodel· 21 dl21 dl
- 🤗yuhuili/EAGLE-LLaMA3-Instruct-8Bmodel· 85k dl· ♡ 685k dl♡ 6
- 🤗yuhuili/EAGLE-LLaMA3-Instruct-70Bmodel· 424 dl· ♡ 6424 dl♡ 6
- 🤗yuhuili/EAGLE-Qwen2-7B-Instructmodel· 432 dl· ♡ 2432 dl♡ 2
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
