Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel; Yaniv Romano

arXiv:2604.12989·cs.CL·April 15, 2026

Accelerating Speculative Decoding with Block Diffusion Draft Trees

Liran Ringel, Yaniv Romano

PDF

1 Repo

TL;DR

This paper introduces DDTree, a novel method that constructs a draft tree from a block diffusion drafter for autoregressive language models, significantly improving speculative decoding efficiency and acceptance length.

Contribution

It presents DDTree, a new approach that builds a draft tree from a diffusion-based drafter, enabling more efficient verification and longer acceptance in speculative decoding.

Findings

01

DDTree outperforms previous methods in speculative decoding speed.

02

The approach achieves longer acceptance lengths with fixed node budgets.

03

Efficient verification is possible with a single forward pass using ancestor-only attention.

Abstract

Speculative decoding accelerates autoregressive language models by using a lightweight drafter to propose multiple future tokens, which the target model then verifies in parallel. DFlash shows that a block diffusion drafter can generate an entire draft block in a single forward pass and achieve state-of-the-art speculative decoding performance, outperforming strong autoregressive drafters such as EAGLE-3. Vanilla DFlash, however, still verifies only a single drafted trajectory per round, potentially limiting its acceptance length. We introduce DDTree (Diffusion Draft Tree), a method that constructs a draft tree directly from the per-position distributions of a block diffusion drafter. Under a fixed node budget, DDTree uses a simple best-first heap algorithm to select the continuations that are most likely to match the target model according to a surrogate defined by the draft model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liranringel/ddtree
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.