Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

Yicun Yang; Cong Wang; Shaobo Wang; Zichen Wen; Biqing Qi; Hanlin Xu; Linfeng Zhang

arXiv:2510.24605·cs.CL·October 29, 2025

Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way

Yicun Yang, Cong Wang, Shaobo Wang, Zichen Wen, Biqing Qi, Hanlin Xu, Linfeng Zhang

PDF

TL;DR

This paper introduces a diffusion-based large language model capable of native variable-length text generation by predicting the end-of-sequence token, significantly improving inference speed and flexibility over fixed-length diffusion models.

Contribution

The paper proposes a novel diffusion LLM with native variable generation lengths, enabling more flexible and efficient text generation by predicting the [EOS] token during diffusion.

Findings

01

30.1x speedup over traditional dLLM inference

02

2.4x faster than autoregressive models like Qwen and Llama

03

Achieves higher accuracy and practical inference speed

Abstract

Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models. However, current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding as a hyper-parameter, leading to issues in efficiency and flexibility. To solve these problems, in this work, we propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var. Concretely, we aim to train a model to accurately predict the [EOS] token in the generated text, which makes a dLLM be able to natively infer in a block diffusion manner, while still maintaining the ability of global bi-directional (full) attention and high parallelism. Experiments on standard benchmarks demonstrate that our method achieves a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.