P-EAGLE: Parallel-Drafting EAGLE with Scalable Training

Mude Hui; Xin Huang; Jaime Campos Salas; Yue Sun; Nathan Pemberton; Xiang Song; Ashish Khetan; George Karypis

arXiv:2602.01469·cs.LG·February 3, 2026

P-EAGLE: Parallel-Drafting EAGLE with Scalable Training

Mude Hui, Xin Huang, Jaime Campos Salas, Yue Sun, Nathan Pemberton, Xiang Song, Ashish Khetan, George Karypis

PDF

Open Access 3 Models

TL;DR

P-EAGLE introduces a scalable parallel-drafting training framework for reasoning LLMs, enabling longer context training with reduced latency through novel attention and sequence partitioning techniques.

Contribution

It transforms EAGLE into a parallel multi-token prediction model and develops methods for efficient long-context training, overcoming quadratic scaling challenges.

Findings

01

Achieves 1.10-1.36x speedup over autoregressive EAGLE-3

02

Enables training with longer contexts using novel attention techniques

03

Demonstrates effectiveness on GPT-OSS 120B, 20B, and Qwen3-Coder 30B models

Abstract

Reasoning LLMs produce longer outputs, requiring speculative decoding drafters trained on extended sequences. Parallel drafting - predicting multiple tokens per forward pass - offers latency benefits over sequential generation, but training complexity scales quadratically with the product of sequence length and parallel positions, rendering long-context training impractical. We present P(arallel)-EAGLE, which transforms EAGLE from autoregressive to parallel multi-token prediction via a learnable shared hidden state. To scale training to long contexts, we develop a framework featuring attention mask pre-computation and sequence partitioning techniques, enabling gradient accumulation within individual sequences for parallel-prediction training. We implement P-EAGLE in vLLM and demonstrate speedups of 1.10-1.36x over autoregressive EAGLE-3 across GPT-OSS 120B, 20B, and Qwen3-Coder 30B.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Adversarial Robustness in Machine Learning