DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference
Fuliang Liu, Xue Li, Ketai Zhao, Yinxi Gao, Ziyan Zhou, Zhonghui Zhang, Zhibin Wang, Wanchun Dou, Sheng Zhong, Chen Tian

TL;DR
DART introduces a diffusion-inspired speculative decoding method that predicts multiple future tokens in parallel, significantly accelerating large language model inference while maintaining high accuracy.
Contribution
It proposes a novel parallel logit prediction approach and an efficient tree pruning algorithm, reducing decoding latency and outperforming existing methods like EAGLE3.
Findings
Achieves 2.03x to 3.44x speedup in decoding time.
Surpasses EAGLE3 by 30% on average in speed.
Maintains high draft accuracy with reduced overhead.
Abstract
Speculative decoding is an effective and lossless approach for accelerating LLM inference. However, existing widely adopted model-based draft designs, such as EAGLE3, improve accuracy at the cost of multi-step autoregressive inference, resulting in high drafting latency and ultimately rendering the drafting stage itself a performance bottleneck. Inspired by diffusion-based large language models (dLLMs), we propose DART, which leverages parallel generation to reduce drafting latency. DART predicts logits for multiple future masked positions in parallel within a single forward pass based on hidden states of the target model, thereby eliminating autoregressive rollouts in the draft model while preserving a lightweight design. Based on these parallel logit predictions, we further introduce an efficient tree pruning algorithm that constructs high-quality draft token trees with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
