Latent-Augmented Discrete Diffusion Models
Dario Shariatian, Alain Durmus, Umut Simsekli, Stefano Peluchetti

TL;DR
This paper introduces Latent-Augmented Discrete Diffusion (LADD), a novel approach that enhances discrete diffusion models with latent variables to improve language generation quality and efficiency.
Contribution
LADD incorporates learnable latent channels into discrete diffusion models, enabling joint or sequential denoising over token and latent spaces for better performance.
Findings
LADD outperforms state-of-the-art masked discrete diffusion baselines on unconditional generation metrics.
LADD is effective at lower sampling budgets, reducing the number of unmasked tokens per step.
Both continuous and discrete latent variants of LADD demonstrate improved results.
Abstract
Discrete diffusion models have emerged as a powerful class of models and a promising route to fast language generation, but practical implementations typically rely on factored reverse transitions ignoring cross-token dependencies and degrading few-step performance. We propose Latent-Augmented Discrete Diffusion (LADD), which introduces a learnable auxiliary latent channel and performs diffusion over the joint (token, latent) space. The latent variables provide an intermediate representation expressing joint structure while preserving tractable parameterizations. We instantiate LADD with continuous latents (Co-LADD) and discrete latents (Di-LADD), and study two inference schedules: a joint diffusion that denoises data and latents together, and a sequential diffusion that first resolves latents and then samples tokens conditionally. We derive ELBO-style objectives and analyze design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
