SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding

Jameson Sandler; Jacob K. Christopher; Thomas Hartvigsen; Ferdinando Fioretto

arXiv:2511.00606·cs.CL·November 5, 2025

SpecDiff-2: Scaling Diffusion Drafter Alignment For Faster Speculative Decoding

Jameson Sandler, Jacob K. Christopher, Thomas Hartvigsen, Ferdinando Fioretto

PDF

Open Access

TL;DR

SpecDiff-2 introduces a diffusion-based non-autoregressive drafting method combined with calibration techniques to significantly accelerate Large Language Model inference, overcoming key bottlenecks in speculative decoding.

Contribution

It presents a novel diffusion-based drafting framework and calibration methods to enhance parallelism and reduce draft rejection rates in speculative decoding.

Findings

01

Achieves up to 55% increase in tokens-per-second

02

Realizes up to 5.5x speed-up over standard decoding

03

Maintains accuracy while significantly improving inference speed

Abstract

Speculative decoding has become the standard approach for accelerating Large Language Model (LLM) inference. It exploits a lossless draft-then-verify procedure to circumvent the latency of autoregressive decoding, achieving impressive speed-ups. Yet, current speculative decoding approaches remain limited by two fundamental bottlenecks: (1) the autoregressive dependency during drafting which limits parallelism, and (2) frequent rejections of draft tokens caused by misalignment between the draft and verify models. This paper proposes SpecDiff-2, a novel framework to jointly address these two bottlenecks. It leverages discrete diffusion as a non-autoregressive drafter to address bottleneck (1) and develops novel techniques to calibrate discrete diffusion drafters with autoregressive verifiers, addressing bottleneck (2). Experimental results across a comprehensive benchmark suite show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Generative Adversarial Networks and Image Synthesis