DLM-One: Diffusion Language Models for One-Step Sequence Generation

Tianqi Chen; Shujian Zhang; Mingyuan Zhou

arXiv:2506.00290·cs.CL·June 3, 2025

DLM-One: Diffusion Language Models for One-Step Sequence Generation

Tianqi Chen, Shujian Zhang, Mingyuan Zhou

PDF

Open Access 3 Reviews

TL;DR

DLM-One introduces a score-distillation framework for one-step sequence generation with continuous diffusion language models, significantly improving inference speed while maintaining performance.

Contribution

It presents a novel one-step diffusion approach that eliminates iterative refinement, enabling faster language generation with continuous diffusion models.

Findings

01

Achieves up to ~500x inference speedup

02

Maintains competitive performance on benchmark tasks

03

Demonstrates generality across multiple datasets

Abstract

This paper introduces DLM-One, a score-distillation-based framework for one-step sequence generation with continuous diffusion language models (DLMs). DLM-One eliminates the need for iterative refinement by aligning the scores of a student model's outputs in the continuous token embedding space with the score function of a pretrained teacher DLM. We investigate whether DLM-One can achieve substantial gains in sampling efficiency for language modeling. Through comprehensive experiments on DiffuSeq -- a representative continuous DLM -- we show that DLM-One achieves up to ~500x speedup in inference time while maintaining competitive performance on benchmark text generation tasks used to evaluate the teacher models. We further analyze the method's empirical behavior across multiple datasets, providing initial insights into its generality and practical applicability. Our findings position…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The proposed method, DLM-One, introduces a novel approach to single-step sequence generation using continuous diffusion language models by leveraging score-based distillation techniques. 2. The authors conduct experiments across three S2S tasks, demonstrating the effectiveness of DLM-One compared to baselines. They also provide thorough analysis of the trade-off between quality and diversity in single-step generation, highlighting potential areas for future improvements.

Weaknesses

1. Lack of Baselines. The paper only discusses one outdated baseline, DiffuSeq (2022), and does not discuss other acceleration techniques. 2. The generation tasks are overly simplistic. The paper does not explicitly discuss the challenges associated with scaling the proposed method to larger datasets or more complex tasks.

Reviewer 02Rating 6Confidence 4

Strengths

1. This paper proposes a practical distillation framework for training continuous diffusion language models for one-step sequence generation (DLM-One), without the need for iterative refinement during generation. 2. Three experiments on benchmarks (QQP, Quasar-T, Wiki-Auto) demonstrate competitive BLEU, ROUGE, and BERTScore compared to DiffuSeq, showing the effectiveness and validity of the framework. 3. This method reduces inference cost—up to 500× speedup—without large quality degradation.

Weaknesses

1. Most of the components (score distillation, adversarial stabilization, two-stage optimization) are transferred from the vision domain, which limits the novelty of this paper. 2. All experiments depend on one teacher model (DiffuSeq). The results may not generalize to other DLMs. 3. Lack of experiments on classic generation tasks that require strict semantic evaluation, such as translation. 4. Since degeneration is mentioned, there is limited qualitative or quantitative analysis of when or why

Reviewer 03Rating 4Confidence 2

Strengths

* The paper's focus on achieving one-step sequence generation with diffusion models is well-grounded and addresses a clear need for improved inference efficiency. * The work demonstrates a strategic adaptation of distillation techniques from vision to language, validating its effectiveness for text generation. * Through comprehensive ablation studies and discussion, the authors provide convincing verification for their data distillation approach with DiffuSeq.

Weaknesses

* All experiments in the paper were conducted using DiffuSeq. It remains unclear whether the proposed method can generalize effectively to other diffusion-based language models. * The empirical comparisons are primarily made against the teacher model. It would be valuable to include comparisons with other accelerated generation baselines and provide an analysis discussing the optimal model selection on the performance-efficiency trade-off.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Model-Driven Software Engineering Techniques