Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Yue Guan; Changming Yu; Shihan Fang; Weiming Hu; Zaifeng Pan; Zheng Wang; Zihan Liu; Yangjie Zhou; Yufei Ding; Minyi Guo; Jingwen Leng

arXiv:2512.23858·cs.LG·January 1, 2026

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Yue Guan, Changming Yu, Shihan Fang, Weiming Hu, Zaifeng Pan, Zheng Wang, Zihan Liu, Yangjie Zhou, Yufei Ding, Minyi Guo, Jingwen Leng

PDF

Open Access 1 Video

TL;DR

Yggdrasil is a system that enhances speculative decoding for large language models by aligning dynamic and static components, achieving near-optimal latency and significant speedups across hardware platforms.

Contribution

It introduces a co-designed approach with context-aware tree drafting, a latency-aware draft selection, and stage-based scheduling to optimize speculative decoding performance.

Findings

01

Achieves up to 3.98x speedup over baselines

02

Supports unmodified LLMs across hardware setups

03

Enables latency-optimal speculative decoding

Abstract

Speculative decoding improves LLM inference by generating and verifying multiple tokens in parallel, but existing systems suffer from suboptimal performance due to a mismatch between dynamic speculation and static runtime assumptions. We present Yggdrasil, a co-designed system that enables latency-optimal speculative decoding through context-aware tree drafting and compiler-friendly execution. Yggdrasil introduces an equal-growth tree structure for static graph compatibility, a latency-aware optimization objective for draft selection, and stage-based scheduling to reduce overhead. Yggdrasil supports unmodified LLMs and achieves up to $3.98 \times$ speedup over state-of-the-art baselines across multiple hardware setups.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding· slideslive

Taxonomy

TopicsNetwork Packet Processing and Optimization · Graph Theory and Algorithms · Parallel Computing and Optimization Techniques