Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding
Rahul Thomas, Teo Kitanovski, Micah Goldblum, Arka Pal

TL;DR
This paper introduces a dynamic delayed tree expansion method with a neural selector to improve multi-path speculative decoding, achieving higher throughput by optimizing verification strategies across diverse models and tasks.
Contribution
It proposes a novel delayed tree expansion technique and a neural selector that enables OT-based verification methods to outperform traversal verification in multi-path decoding.
Findings
Traversal Verification consistently outperforms OT-based methods in prior evaluations.
Delayed tree expansion preserves the target distribution and enhances root-node i.i.d. rollouts.
The neural selector enables OT-based methods to surpass traversal verification, increasing throughput by 5%.
Abstract
Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that accepts a subset of these. While prior work has proposed various verification algorithms for i.i.d rollouts, their relative performance under matched settings remains unclear. In this work, we firstly present a systematic evaluation of verification strategies across model families, tasks, and sampling regimes, and find that Traversal Verification dominates consistently, with OT-based methods lagging far behind. Our analysis uncovers that this occurs because OT-based methods achieve high multi-token acceptance near the root of the draft tree, while multi-token gains are most impactful deeper in the draft tree, where draft and target distributions diverge. Based on this insight, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
