Few-Step Diffusion Language Models via Trajectory Self-Distillation

Tunyu Zhang; Xinxi Zhang; Ligong Han; Haizhou Shi; Xiaoxiao He; Zhuowei Li; Hao Wang; Kai Xu; Akash Srivastava; Chengzhi Mao; Hao Wang; Vladimir Pavlovic; Dimitris N. Metaxas

arXiv:2602.12262·cs.CL·May 18, 2026

Few-Step Diffusion Language Models via Trajectory Self-Distillation

Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Chengzhi Mao, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a self-distillation framework for diffusion large language models that significantly improves few-step decoding quality, enabling faster text generation without substantial performance loss.

Contribution

It proposes a trajectory-level supervision method combined with Direct Discriminative Optimization to enhance few-step decoding in diffusion language models, reducing quality degradation.

Findings

01

Substantially narrows the gap between few-step and full-step decoding.

02

Improves performance on reasoning and code-generation benchmarks.

03

The source code is publicly available at https://github.com/Tyrion58/T3D.

Abstract

Diffusion large language models (DLLMs) have emerged as powerful generative models with the promise of fast text generation through parallel decoding. However, realizing this potential in practice remains challenging: reducing the number of decoding steps, typically causes a substantial degradation in output quality due to token factorization error. To alleviate this, we propose a self-distillation framework that trains a few-step student to match the generative trajectory of a full-step teacher. We theoretically and empirically show that trajectory-level supervision mitigates this factorization error, thereby enabling effective few-step decoding. We further incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that encourages mode-seeking toward the teacher's modes, yielding stronger performance on challenging reasoning tasks. Across reasoning and code-generation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Tyrion58/T3D
github

Datasets

Tyrion279/SDAR-4B-Chat-MATH-Trajectories
dataset· 16 dl
16 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications