Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Kecheng Chen; Ziru Liu; Xijia Tao; Hui Liu; Yibing Liu; Xinyu Fu; Shi Wu; Suiyun Zhang; Dandan Tu; Lingpeng Kong; Rui Liu; and Haoliang Li

arXiv:2605.11854·cs.CL·May 19, 2026

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Kecheng Chen, Ziru Liu, Xijia Tao, Hui Liu, Yibing Liu, Xinyu Fu, Shi Wu, Suiyun Zhang, Dandan Tu, Lingpeng Kong, Rui Liu, and Haoliang Li

PDF

TL;DR

This paper introduces TABOM, a novel self-distilled trajectory-based training method for diffusion language models that aligns training with inference trajectories, leading to improved domain adaptation and reduced forgetting.

Contribution

The paper proposes TABOM, a trajectory-aligned Boltzmann modeling framework that enhances diffusion language models by better aligning training with inference trajectories.

Findings

01

TABOM significantly improves domain adaptation of DLMs.

02

It expands the effective knowledge boundary of models.

03

It mitigates catastrophic forgetting compared to standard fine-tuning.

Abstract

Diffusion Language Models (DLMs) have recently emerged as a promising alternative to autoregressive language models, offering stronger global awareness and highly parallel generation. However, post-training DLMs with standard Negative Evidence Lower Bound (NELBO)-based supervised fine-tuning remains inefficient: training reconstructs randomly masked tokens in a single step, whereas inference follows a confidence-guided, multi-step easy-to-hard denoising trajectory. Recent trajectory-based self-distillation methods exploit such inference trajectories mainly for sampling-step compression and acceleration, often improving decoding efficiency without substantially enhancing the model's underlying capability, and may even degrade performance under full diffusion decoding. In this work, we ask whether self-distilled trajectories can be used not merely for faster inference, but for genuine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.