D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

Tianyu Wu; Yu Yao; Zhenting Qi; Han Zheng; Zhuohan Wang; Haoran Ma; Lawrence Liao; Himabindu Lakkaraju; Ju Li; Yilun Du

arXiv:2605.18810·cs.LG·May 20, 2026

D-PACE: Dynamic Position-Aware Cross-Entropy for Parallel Speculative Drafting

Tianyu Wu, Yu Yao, Zhenting Qi, Han Zheng, Zhuohan Wang, Haoran Ma, Lawrence Liao, Himabindu Lakkaraju, Ju Li, Yilun Du

PDF

TL;DR

D-PACE introduces a dynamic, position-aware training method for parallel speculative decoding in large language models, improving speed and draft length without altering inference procedures.

Contribution

It derives a novel loss function that adapts training weights based on position-specific acceptance limits, enhancing draft quality and speed.

Findings

01

Consistently improves speedup and emitted draft length across multiple benchmarks.

02

Achieves 2.3% training-time overhead without changing inference architecture.

03

Effective across various models, depths, and decoding temperatures.

Abstract

Speculative decoding accelerates LLM inference by having a small drafter propose tokens that a larger target model verifies in parallel. Recent diffusion-based parallel drafters such as DFlash predict the full B-token block in one forward pass, enabling deeper drafters and longer accepted blocks. However, existing multi-token drafter objectives often use fixed position-dependent weighting schedules, such as head-dependent weights or block-position decays, which do not adapt as the positions limiting acceptance change during training. To address this, we derive per-position training weights from a differentiable surrogate of expected accepted draft length, matching the weight of each position to its log-probability gradient contribution. The resulting loss, D-PACE (Dynamic Position-Aware Cross-Entropy), shifts training signal toward positions that currently limit acceptance as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.