Traversal Verification for Speculative Tree Decoding
Yepeng Weng, Qiao Hu, Xujie Chen, Li Liu, Dianwen Mei, Huishi Qiu, Jiang Tian, Zhongchao Shi

TL;DR
This paper proposes Traversal Verification, a novel speculative decoding method for large language models that improves acceptance length and throughput by rethinking verification from leaf to root, ensuring lossless inference.
Contribution
It introduces a leaf-to-root traversal verification approach that guarantees identical probability distribution to the target model, enhancing efficiency in speculative decoding.
Findings
Improves acceptance length and throughput in large language models
Guarantees lossless inference through theoretical proof
Consistently outperforms existing speculative decoding methods
Abstract
Speculative decoding is a promising approach for accelerating large language models. The primary idea is to use a lightweight draft model to speculate the output of the target model for multiple subsequent timesteps, and then verify them in parallel to determine whether the drafted tokens should be accepted or rejected. To enhance acceptance rates, existing frameworks typically construct token trees containing multiple candidates in each timestep. However, their reliance on token-level verification mechanisms introduces two critical limitations: First, the probability distribution of a sequence differs from that of individual tokens, leading to suboptimal acceptance length. Second, current verification schemes begin from the root node and proceed layer by layer in a top-down manner. Once a parent node is rejected, all its child nodes should be discarded, resulting in inefficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis
