Diffusion Language Models Generation Can Be Halted Early
Sofia Maria Lo Cicero Vaina, Nikita Balagansky, Daniil Gavrilov

TL;DR
This paper introduces a method to halt diffusion language model generation early, reducing computation time by 10-40% without sacrificing output quality, thus improving efficiency in text generation.
Contribution
The authors propose a novel adaptive halting technique for diffusion language models, enabling faster generation while maintaining quality, addressing a key performance gap with autoregressive models.
Findings
Generation time reduced by 10-40%
Models can be halted early without quality loss
Applicable to multiple diffusion language models
Abstract
Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However, despite these notable features, DLMs have not yet reached the performance levels of their autoregressive counterparts. One of the ways to reduce the performance gap between these two types of language models is to speed up the generation of DLMs. Therefore, we propose a novel methodology to address this issue in this work. It enables the execution of more generation steps within a given time frame, leading to higher-quality outputs. Specifically, our methods estimate DLMs completeness of text generation and allow adaptive halting of the generation process. We evaluate our methods on Plaid, SSD, and CDCD DLMs and create a cohesive perspective on their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution
MethodsConvolution · Non Maximum Suppression · 1x1 Convolution · SSD · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
