EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs

Chang Han; Yijie Hu; Jingling Liu

arXiv:2603.08088·cs.LG·March 10, 2026

EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs

Chang Han, Yijie Hu, Jingling Liu

PDF

Open Access

TL;DR

EAGLE-Pangu introduces a system for accelerator-safe tree speculative decoding on Ascend NPUs, significantly improving large language model decoding throughput while ensuring robustness and reproducibility across heterogeneous hardware backends.

Contribution

It presents novel techniques for safe and efficient tree-structured speculative decoding tailored for Ascend NPUs, including a cache manager, tensorization method, and verification path.

Findings

01

Decoding throughput improved by up to 2.46x at p99

02

System is reproducible and debuggable across execution modes

03

Achieves 1.27x average throughput improvement on benchmark tests

Abstract

Autoregressive decoding remains a primary bottleneck in large language model (LLM) serving, motivating speculative decoding methods that reduce expensive teacher-model invocations by verifying multiple candidate tokens per step. Tree-structured speculation further increases parallelism, but is often brittle when ported across heterogeneous backends and accelerator stacks, where attention masking, KV-cache layouts, and indexing semantics are not interchangeable. We present EAGLE-Pangu, a reproducible system that ports EAGLE-3-style tree speculative decoding to a Pangu teacher backend on Ascend NPUs. EAGLE-Pangu contributes (i) an explicit branch/commit cache manager built on the Cache API, (ii) accelerator-safe tree tensorization that removes undefined negative indices by construction and validates structural invariants, and (iii) a fused-kernel-compatible teacher verification path with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)