PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

Zihao An; Taichi Liu; Ziqiong Liu; Dong Li; Ruofeng Liu; Emad Barsoum

arXiv:2605.08632·cs.CL·May 12, 2026

PARD-2: Target-Aligned Parallel Draft Model for Dual-Mode Speculative Decoding

Zihao An, Taichi Liu, Ziqiong Liu, Dong Li, Ruofeng Liu, Emad Barsoum

PDF

1 Repo

TL;DR

PARD-2 introduces a dual-mode speculative decoding framework with Confidence-Adaptive Token optimization, significantly accelerating LLM inference by better aligning draft model training with acceptance length goals.

Contribution

It reformulates draft model training to focus on acceptance length, enabling a single model to support multiple modes and achieve substantial speedups.

Findings

01

Achieves up to 6.94× acceleration in LLM inference.

02

Surpasses EAGLE-3 by 1.9× and PARD by 1.3× on Llama3.1-8B.

03

Supports both target-dependent and target-independent modes.

Abstract

Speculative decoding accelerates Large Language Models (LLMs) inference by using a lightweight draft model to propose candidate tokens that are verified in parallel by the target model. However, existing draft model training objectives are not directly aligned with the inference-time goal of maximizing consecutive token acceptance. To address this issue, we reformulate the draft model optimization objective, shifting the focus from token prediction accuracy to the overall acceptance length. In this paper, we build upon PARD to propose PARD-2, a dual-mode speculative decoding framework with Confidence-Adaptive Token (CAT) optimization. This approach adaptively reweights each token to better align with the verification process. Notably, PARD-2 enables a single draft model to support both target-dependent and target-independent modes. Experiments across diverse models and tasks demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AMD-AGI/PARD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.