OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Jikai Wang; Yi Su; Juntao Li; Qingrong Xia; Zi Ye and; Xinyu Duan; Zhefeng Wang; Min Zhang

arXiv:2406.17276·cs.CL·April 25, 2025

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye and, Xinyu Duan, Zhefeng Wang, Min Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

OPT-Tree introduces an adaptive draft tree structure for speculative decoding, significantly improving inference speed and efficiency in autoregressive language models by optimizing acceptance length during decoding.

Contribution

It proposes a novel adaptive and scalable draft tree algorithm that constructs optimal structures to maximize acceptance length, surpassing fixed heuristic methods.

Findings

01

Achieves up to 3.2x speed-up over autoregressive decoding.

02

Can generate more than ten tokens in a single step with sufficient resources.

03

Outperforms existing draft structures in experimental evaluations.

Abstract

Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which fail to adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we proposed OPT-Tree, an algorithm to construct adaptive and scalable draft trees. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results reveal that OPT-Tree outperforms the existing draft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jikai0wang/opt-tree
pytorchOfficial

Videos

OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure· underline

Taxonomy

TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Advanced Database Systems and Queries