Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Tianyu Xie; Shuchen Xue; Zijin Feng; Tianyang Hu; Jiacheng Sun; Zhenguo Li; Cheng Zhang

arXiv:2505.17384·cs.LG·April 15, 2026

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Tianyu Xie, Shuchen Xue, Zijin Feng, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Cheng Zhang

PDF

1 Video

TL;DR

This paper introduces VADD, a framework that enhances discrete diffusion models by modeling inter-dimensional correlations with latent variables, improving sample quality especially with few denoising steps.

Contribution

VADD incorporates latent variable modeling into discrete diffusion, enabling better correlation capture and stable training, leading to improved performance over existing methods.

Findings

01

VADD outperforms MDM baselines in sample quality.

02

Significant improvements with few denoising steps.

03

Effective on 2D toy data, images, and text.

Abstract

Discrete diffusion models have recently shown great promise for modeling complex discrete data, with masked diffusion models (MDMs) offering a compelling trade-off between quality and generation speed. MDMs denoise by progressively unmasking multiple dimensions from an all-masked input, but their performance can degrade when using few denoising steps due to limited modeling of inter-dimensional dependencies. In this paper, we propose Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions. By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds maximization and amortized inference over the training set. Our approach retains the efficiency of traditional MDMs while significantly improving sample quality,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling· slideslive