Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion
Bac Nguyen, Chieh-Hsin Lai, Yuhta Takida, Naoki Murata and, Toshimitsu Uesaka, Stefano Ermon, Yuki Mitsufuji

TL;DR
This paper introduces VQ-LCMD, a novel continuous-space latent diffusion framework that jointly learns embeddings and diffusion models, improving image generation quality and stability over previous methods.
Contribution
The paper proposes VQ-LCMD, a new training approach combining joint embedding-diffusion variational bounds with a consistency-matching loss for stable end-to-end training.
Findings
VQ-LCMD outperforms previous discrete-state models on benchmarks.
Achieves an FID of 6.81 on ImageNet with 50 steps.
Demonstrates improved stability and quality in image generation.
Abstract
By embedding discrete representations into a continuous latent space, we can leverage continuous-space latent diffusion models to handle generative modeling of discrete data. However, despite their initial success, most latent diffusion methods rely on fixed pretrained embeddings, limiting the benefits of joint training with the diffusion model. While jointly learning the embedding (via reconstruction loss) and the latent diffusion model (via score matching loss) could enhance performance, end-to-end training risks embedding collapse, degrading generation quality. To mitigate this issue, we introduce VQ-LCMD, a continuous-space latent diffusion framework within the embedding space that stabilizes training. VQ-LCMD uses a novel training objective combining the joint embedding-diffusion variational lower bound with a consistency-matching (CM) loss, alongside a shifted cosine noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference
MethodsDiffusion · Latent Diffusion Model
