Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way
Yicun Yang, Cong Wang, Shaobo Wang, Zichen Wen, Biqing Qi, Hanlin Xu, Linfeng Zhang

TL;DR
This paper introduces a diffusion-based large language model capable of native variable-length text generation by predicting the end-of-sequence token, significantly improving inference speed and flexibility over fixed-length diffusion models.
Contribution
The paper proposes a novel diffusion LLM with native variable generation lengths, enabling more flexible and efficient text generation by predicting the [EOS] token during diffusion.
Findings
30.1x speedup over traditional dLLM inference
2.4x faster than autoregressive models like Qwen and Llama
Achieves higher accuracy and practical inference speed
Abstract
Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models. However, current dLLMs suffer from fixed generation lengths, which indicates the generation lengths of dLLMs have to be determined before decoding as a hyper-parameter, leading to issues in efficiency and flexibility. To solve these problems, in this work, we propose to train a diffusion LLM with native variable generation lengths, abbreviated as dLLM-Var. Concretely, we aim to train a model to accurately predict the [EOS] token in the generated text, which makes a dLLM be able to natively infer in a block diffusion manner, while still maintaining the ability of global bi-directional (full) attention and high parallelism. Experiments on standard benchmarks demonstrate that our method achieves a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
