Traversal Verification for Speculative Tree Decoding

Yepeng Weng; Qiao Hu; Xujie Chen; Li Liu; Dianwen Mei; Huishi Qiu; Jiang Tian; Zhongchao Shi

arXiv:2505.12398·cs.CL·November 6, 2025

Traversal Verification for Speculative Tree Decoding

Yepeng Weng, Qiao Hu, Xujie Chen, Li Liu, Dianwen Mei, Huishi Qiu, Jiang Tian, Zhongchao Shi

PDF

Open Access

TL;DR

This paper proposes Traversal Verification, a novel speculative decoding method for large language models that improves acceptance length and throughput by rethinking verification from leaf to root, ensuring lossless inference.

Contribution

It introduces a leaf-to-root traversal verification approach that guarantees identical probability distribution to the target model, enhancing efficiency in speculative decoding.

Findings

01

Improves acceptance length and throughput in large language models

02

Guarantees lossless inference through theoretical proof

03

Consistently outperforms existing speculative decoding methods

Abstract

Speculative decoding is a promising approach for accelerating large language models. The primary idea is to use a lightweight draft model to speculate the output of the target model for multiple subsequent timesteps, and then verify them in parallel to determine whether the drafted tokens should be accepted or rejected. To enhance acceptance rates, existing frameworks typically construct token trees containing multiple candidates in each timestep. However, their reliance on token-level verification mechanisms introduces two critical limitations: First, the probability distribution of a sequence differs from that of individual tokens, leading to suboptimal acceptance length. Second, current verification schemes begin from the root node and proceed layer by layer in a top-down manner. Once a parent node is rejected, all its child nodes should be discarded, resulting in inefficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis