OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye and, Xinyu Duan, Zhefeng Wang, Min Zhang

TL;DR
OPT-Tree introduces an adaptive draft tree structure for speculative decoding, significantly improving inference speed and efficiency in autoregressive language models by optimizing acceptance length during decoding.
Contribution
It proposes a novel adaptive and scalable draft tree algorithm that constructs optimal structures to maximize acceptance length, surpassing fixed heuristic methods.
Findings
Achieves up to 3.2x speed-up over autoregressive decoding.
Can generate more than ten tokens in a single step with sufficient resources.
Outperforms existing draft structures in experimental evaluations.
Abstract
Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which fail to adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we proposed OPT-Tree, an algorithm to construct adaptive and scalable draft trees. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results reveal that OPT-Tree outperforms the existing draft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Algorithms and Data Compression · Advanced Database Systems and Queries
