Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding
Yuxuan Zhou, Fei Huang, Heng Li, Fengyi Wu, Tianyu Wang, Jianwei Zhang, Junyang Lin, Zhi-Qi Cheng

TL;DR
This paper introduces Hierarchical Speculative Decoding (HSD), a lossless verification method that overcomes joint intractability, significantly improving acceptance rates and decoding efficiency across diverse models without losing distribution fidelity.
Contribution
HSD is a novel, provably lossless verification approach that balances probability mass to address joint intractability, enhancing speculative decoding performance.
Findings
HSD increases acceptance rates across various models and benchmarks.
Integrating HSD into EAGLE-3 improves performance by over 12%.
HSD maintains distribution fidelity while boosting decoding efficiency.
Abstract
Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose Hierarchical Speculative Decoding (HSD), a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and…
Peer Reviews
Decision·ICLR 2026 Oral
1. Exceptionally rigorous theoretical analysis with detailed proofs establishing the lossless property of HSD. The paper clearly demonstrates how HSD correctly recovers the target distribution through careful handling of branch divergence and capped ratios. 2. Significant conceptual contribution by identifying and solving the "joint intractability" problem in speculative decoding - how existing methods miscalculate acceptance probabilities for multi-token sequences, leading to suboptimal perform
1. Limited experimental scope - the paper only evaluates HSD using Qwen2.5 models across three benchmarks. A broader evaluation with multiple model families and additional tasks would strengthen the empirical validation. 2. Insufficient comparisons with state-of-the-art speculative decoding frameworks. While thoroughly comparing against tokenwise and blockwise methods, the paper omits systematic comparisons with more advanced approachesr. 3. Limited ablation studies to understand the contributio
- The proposed hierarchical branch resampling strategy is a novel and creative approach to addressing joint intractability in speculative decoding. - The theoretical analysis is rigorous, and the experimental validation is comprehensive, showing consistent improvements across various benchmarks. - The paper is clearly written, and complex ideas are explained clearly. The use of figures and equations aids in understanding the methodology and its underlying theory.
- It’s unclear whether the backward scan of HSD introduces any additional computational overhead.
The topic: Accelerating generative inference in LLMs is a highly important problem
1 The flow of the paper is not smooth. The theoretical derivation of “joint intractability” and then the transition to the hierarchical method is abrupt. It is not always clear how the high-level algorithm ties into the low-level proofs and experiments. 2 While the paper describes “hierarchical speculative decoding”, the exact steps ie, branch generation, verification hierarchy, mass-balancing, acceptance criteria, are somewhat buried in dense math and less in intuitive explanation or pseudo-co
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Signal Modulation Classification · Generative Adversarial Networks and Image Synthesis · Error Correcting Code Techniques
