d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation

Yu-Yang Qian; Junda Su; Lanxiang Hu; Peiyuan Zhang; Zhijie Deng; Peng Zhao; Hao Zhang

arXiv:2601.07568·cs.LG·January 30, 2026

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation

Yu-Yang Qian, Junda Su, Lanxiang Hu, Peiyuan Zhang, Zhijie Deng, Peng Zhao, Hao Zhang

PDF

Open Access 3 Models 2 Datasets

TL;DR

d3LLM introduces a novel training and inference approach for diffusion-based large language models, balancing accuracy and parallelism to enable faster decoding without significant performance loss.

Contribution

The paper proposes pseudo-trajectory distillation and entropy-based multi-block decoding for diffusion LLMs, achieving high parallelism and accuracy balance.

Findings

01

Up to 10x speedup over vanilla LLaDA/Dream.

02

5x faster than autoregressive models with minimal accuracy loss.

03

Introduces AUP metric for joint accuracy and parallelism evaluation.

Abstract

Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits in practice is non-trivial, as dLLMs inherently face an accuracy-parallelism trade-off. Despite increasing interest, existing methods typically focus on only one-side of the coin, targeting either efficiency or performance. To address this limitation, we propose d3LLM (Pseudo-Distilled Diffusion Large Language Model), striking a balance between accuracy and parallelism: (i) during training, we introduce pseudo-trajectory distillation to teach the model which tokens can be decoded confidently at early steps, thereby improving parallelism; (ii) during inference, we employ entropy-based multi-block decoding with a KV-cache refresh mechanism to achieve high parallelism while maintaining accuracy. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning in Healthcare