Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

Rahul Thomas; Teo Kitanovski; Micah Goldblum; Arka Pal

arXiv:2602.16994·cs.LG·February 20, 2026

Dynamic Delayed Tree Expansion For Improved Multi-Path Speculative Decoding

Rahul Thomas, Teo Kitanovski, Micah Goldblum, Arka Pal

PDF

Open Access

TL;DR

This paper introduces a dynamic delayed tree expansion method with a neural selector to improve multi-path speculative decoding, achieving higher throughput by optimizing verification strategies across diverse models and tasks.

Contribution

It proposes a novel delayed tree expansion technique and a neural selector that enables OT-based verification methods to outperform traversal verification in multi-path decoding.

Findings

01

Traversal Verification consistently outperforms OT-based methods in prior evaluations.

02

Delayed tree expansion preserves the target distribution and enhances root-node i.i.d. rollouts.

03

The neural selector enables OT-based methods to surpass traversal verification, increasing throughput by 5%.

Abstract

Multi-path speculative decoding accelerates lossless sampling from a target model by using a cheaper draft model to generate a draft tree of tokens, and then applies a verification algorithm that accepts a subset of these. While prior work has proposed various verification algorithms for i.i.d rollouts, their relative performance under matched settings remains unclear. In this work, we firstly present a systematic evaluation of verification strategies across model families, tasks, and sampling regimes, and find that Traversal Verification dominates consistently, with OT-based methods lagging far behind. Our analysis uncovers that this occurs because OT-based methods achieve high multi-token acceptance near the root of the draft tree, while multi-token gains are most impactful deeper in the draft tree, where draft and target distributions diverge. Based on this insight, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications