TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees
Tianyu Liu, Qitan Lv, Yuhao Shen, Xiao Sun, Xiaoyan Sun

TL;DR
TALON introduces an adaptive, budget-driven speculative decoding framework that dynamically adjusts the draft tree structure, significantly accelerating large language model inference without quality loss.
Contribution
It proposes TALON, a training-free, adaptive tree expansion method that optimizes speculative decoding by dynamically shaping the draft tree based on token difficulty and context.
Findings
Achieves up to 5.16x speedup over autoregressive decoding.
Outperforms state-of-the-art EAGLE-3 across multiple models and datasets.
Effectively balances exploration and depth in tree-based decoding.
Abstract
Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality. Recent advances in speculative decoding have shifted from sequential chain-based drafting to tree-structured generation, where the draft model constructs a tree of candidate tokens to explore multiple possible drafts in parallel. However, existing tree-based SD methods typically build a fixed-width, fixed-depth draft tree, which fails to adapt to the varying difficulty of tokens and contexts. As a result, the draft model cannot dynamically adjust the tree structure to early stop on difficult tokens and extend generation for simple ones. To address these challenges, we introduce TALON, a training-free, budget-driven adaptive tree expansion framework that can be plugged into existing tree-based methods. Unlike static methods, TALON constructs the draft tree…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Topic Modeling · Natural Language Processing Techniques
