EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs
Chang Han, Yijie Hu, Jingling Liu

TL;DR
EAGLE-Pangu introduces a system for accelerator-safe tree speculative decoding on Ascend NPUs, significantly improving large language model decoding throughput while ensuring robustness and reproducibility across heterogeneous hardware backends.
Contribution
It presents novel techniques for safe and efficient tree-structured speculative decoding tailored for Ascend NPUs, including a cache manager, tensorization method, and verification path.
Findings
Decoding throughput improved by up to 2.46x at p99
System is reproducible and debuggable across execution modes
Achieves 1.27x average throughput improvement on benchmark tests
Abstract
Autoregressive decoding remains a primary bottleneck in large language model (LLM) serving, motivating speculative decoding methods that reduce expensive teacher-model invocations by verifying multiple candidate tokens per step. Tree-structured speculation further increases parallelism, but is often brittle when ported across heterogeneous backends and accelerator stacks, where attention masking, KV-cache layouts, and indexing semantics are not interchangeable. We present EAGLE-Pangu, a reproducible system that ports EAGLE-3-style tree speculative decoding to a Pangu teacher backend on Ascend NPUs. EAGLE-Pangu contributes (i) an explicit branch/commit cache manager built on the Cache API, (ii) accelerator-safe tree tensorization that removes undefined negative indices by construction and validates structural invariants, and (iii) a fused-kernel-compatible teacher verification path with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
