Promises, Outlooks and Challenges of Diffusion Language Modeling

Justin Deschenaux; Caglar Gulcehre

arXiv:2406.11473·cs.CL·July 11, 2024

Promises, Outlooks and Challenges of Diffusion Language Modeling

Justin Deschenaux, Caglar Gulcehre

PDF

Open Access

TL;DR

This paper evaluates the Score Entropy Discrete Diffusion (SEDD) approach as an alternative to autoregressive language models, highlighting its comparable performance, improved inference efficiency, and current limitations in conditional generation.

Contribution

It provides an empirical assessment of SEDD, demonstrating its potential advantages over autoregressive models and identifying areas for improvement.

Findings

01

SEDD matches autoregressive models in perplexity and benchmark tasks.

02

SEDD can be up to 4.5 times more efficient in inference than GPT-2.

03

SEDD is slightly weaker than GPT-2 in conditional generation with short prompts.

Abstract

The modern autoregressive Large Language Models (LLMs) have achieved outstanding performance on NLP benchmarks, and they are deployed in the real world. However, they still suffer from limitations of the autoregressive training paradigm. For example, autoregressive token generation is notably slow and can be prone to \textit{exposure bias}. The diffusion-based language models were proposed as an alternative to autoregressive generation to address some of these limitations. We evaluate the recently proposed Score Entropy Discrete Diffusion (SEDD) approach and show it is a promising alternative to autoregressive generation but it has some short-comings too. We empirically demonstrate the advantages and challenges of SEDD, and observe that SEDD generally matches autoregressive models in perplexity and on benchmarks such as HellaSwag, Arc or WinoGrande. Additionally, we show that in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Layer Normalization · Byte Pair Encoding · Attention Dropout · Weight Decay · Dropout · Adam · Linear Warmup With Cosine Annealing