EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

TL;DR
EAGLE introduces a new speculative sampling method that predicts features at the second-to-top-layer level, significantly speeding up large language model inference with minimal accuracy loss.
Contribution
The paper presents EAGLE, a novel framework that improves inference efficiency by leveraging feature-level autoregression and sequence extrapolation, challenging token-level approaches.
Findings
Achieved 2.7x-3.5x latency speedup on LLaMA2-Chat 70B.
Doubled throughput without changing output distribution.
Effective across multiple models and tasks.
Abstract
Autoregressive decoding makes the inference of Large Language Models (LLMs) time-consuming. In this paper, we reconsider speculative sampling and derive two key observations. Firstly, autoregression at the feature (second-to-top-layer) level is more straightforward than at the token level. Secondly, the inherent uncertainty in feature (second-to-top-layer) level autoregression constrains its performance. Based on these insights, we introduce EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a simple yet highly efficient speculative sampling framework. By incorporating a token sequence advanced by one time step, EAGLE effectively resolves the uncertainty, enabling precise second-to-top-layer feature prediction with minimal overhead. We conducted comprehensive evaluations of EAGLE, including all models from the Vicuna and LLaMA2-Chat series, the MoE model Mixtral 8x7B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yuhuili/EAGLE-Vicuna-7B-v1.3model· 507 dl· ♡ 3507 dl♡ 3
- 🤗yuhuili/EAGLE-Vicuna-33B-v1.3model· 10 dl10 dl
- 🤗yuhuili/EAGLE-Vicuna-13B-v1.3model· 200 dl200 dl
- 🤗yuhuili/EAGLE-llama2-chat-7Bmodel· 488 dl· ♡ 5488 dl♡ 5
- 🤗yuhuili/EAGLE-llama2-chat-13Bmodel· 41 dl41 dl
- 🤗yuhuili/EAGLE-llama2-chat-70Bmodel· 20 dl· ♡ 120 dl♡ 1
- 🤗yuhuili/EAGLE-mixtral-instruct-8x7Bmodel· 21 dl21 dl
- 🤗yuhuili/EAGLE-LLaMA3-Instruct-8Bmodel· 85k dl· ♡ 685k dl♡ 6
- 🤗yuhuili/EAGLE-LLaMA3-Instruct-70Bmodel· 424 dl· ♡ 6424 dl♡ 6
- 🤗yuhuili/EAGLE-Qwen2-7B-Instructmodel· 432 dl· ♡ 2432 dl♡ 2
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification
MethodsLookahead
