DEER: Draft with Diffusion, Verify with Autoregressive Models
Zicong Cheng, Guo-Wei Yang, Jia Li, Zhijie Deng, Meng-Hao Guo, Shi-Min Hu

TL;DR
DEER introduces a novel speculative decoding framework that uses diffusion models for drafting and autoregressive models for verification, significantly improving decoding efficiency in large language models.
Contribution
The paper proposes DEER, a new decoding method combining diffusion-based drafting with AR verification, overcoming limitations of sequential AR drafting and enabling faster decoding.
Findings
DEER achieves up to 5.54x speedup on HumanEval.
DEER drafts up to 32 tokens, surpassing previous methods.
DEER maintains high draft quality through a two-stage training pipeline.
Abstract
Efficiency, as a critical practical challenge for LLM-driven agentic and reasoning systems, is increasingly constrained by the inherent latency of autoregressive (AR) decoding. Speculative decoding mitigates this cost through a draft-verify scheme, yet existing approaches rely on AR draft models (a.k.a., drafters), which introduce two fundamental issues: (1) step-wise uncertainty accumulation leads to a progressive collapse of trust between the target model and the drafter, and (2) inherently sequential decoding of AR drafters. Together, these factors cause limited speedups. In this paper, we show that a diffusion large language model (dLLM) drafters can naturally overcome these issues through its fundamentally different probabilistic modeling and efficient parallel decoding strategy. Building on this insight, we introduce DEER, an efficient speculative decoding framework that drafts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
