DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Yilun Xu; Gabriele Corso; Tommi Jaakkola; Arash Vahdat; Karsten Kreis

arXiv:2407.03300·cs.LG·July 4, 2024

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis

PDF

Open Access 1 Repo

TL;DR

DisCo-Diff introduces discrete latent variables into diffusion models, simplifying the learning process and improving performance across various data synthesis tasks without relying on pre-trained networks.

Contribution

The paper presents a novel framework that integrates learnable discrete latents into diffusion models, enhancing their ability to model complex data distributions.

Findings

01

DisCo-Diff achieves state-of-the-art FID scores on ImageNet-64/128.

02

Discrete latents reduce the complexity of the diffusion process.

03

Model performance improves across toy data, image synthesis, and molecular docking.

Abstract

Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into a simple Gaussian distribution. However, encoding a complex, potentially multimodal data distribution into a single continuous Gaussian distribution arguably represents an unnecessarily challenging learning problem. We propose Discrete-Continuous Latent Variable Diffusion Models (DisCo-Diff) to simplify this task by introducing complementary discrete latent variables. We augment DMs with learnable discrete latents, inferred with an encoder, and train DM and encoder end-to-end. DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable. The discrete latents significantly simplify learning the DM's complex noise-to-data mapping by reducing the curvature of the DM's generative ODE. An additional autoregressive transformer models the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gcorso/disco-diffdock
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsDiffusion