Simplified and Generalized Masked Diffusion for Discrete Data

Jiaxin Shi; Kehang Han; Zhe Wang; Arnaud Doucet; Michalis K. Titsias

arXiv:2406.04329·cs.LG·January 17, 2025·3 cites

Simplified and Generalized Masked Diffusion for Discrete Data

Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a simplified, unified framework for masked diffusion models that improves discrete data modeling, achieving state-of-the-art results in language and image generation tasks by leveraging a continuous-time variational objective.

Contribution

The work presents a simple, general framework for masked diffusion models, clarifies their theoretical foundation, and demonstrates superior performance on language and image benchmarks.

Findings

01

Outperforms prior diffusion language models on perplexity and zero-shot tasks.

02

Achieves state-of-the-art bits per dimension on CIFAR-10 and ImageNet 64x64.

03

Provides a unified, theoretically grounded approach to masked diffusion modeling.

Abstract

Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-deepmind/md4
jaxOfficial

Videos

Simplified and Generalized Masked Diffusion for Discrete Data· slideslive

Taxonomy

TopicsImage and Signal Denoising Methods

MethodsAttention Is All You Need · Cosine Annealing · Layer Normalization · Weight Decay · Linear Warmup With Cosine Annealing · Linear Layer · Byte Pair Encoding · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout