Generalized Denoising Diffusion Codebook Models (gDDCM): Tokenizing images using a pre-trained diffusion model
Fei Kong

TL;DR
This paper introduces gDDCM, a unified framework for image tokenization using diffusion models that improves compatibility with various diffusion architectures and enhances reconstruction quality.
Contribution
It proposes a generalized theoretical framework and a novel sampling strategy for diffusion-based image tokenization, addressing limitations of previous models.
Findings
Outperforms DDCM in reconstruction quality and perceptual fidelity
Achieves compatibility with multiple diffusion variants
Demonstrates effectiveness on CIFAR10 and LSUN datasets
Abstract
Denoising diffusion models have emerged as a dominant paradigm in image generation. Discretizing image data into tokens is a critical step for effectively integrating images with Transformer and other architectures. Although the Denoising Diffusion Codebook Models (DDCM) pioneered the use of pre-trained diffusion models for image tokenization, it strictly relies on the traditional discrete-time DDPM architecture. Consequently, it fails to adapt to modern continuous-time variants-such as Flow Matching and Consistency Models-and suffers from inefficient sampling in high-noise regions. To address these limitations, this paper proposes the Generalized Denoising Diffusion Codebook Models (gDDCM). We establish a unified theoretical framework and introduce a generic "De-noise and Back-trace" sampling strategy. By integrating a deterministic ODE denoising step with a residual-aligned noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neuroimaging Techniques and Applications · Digital Media Forensic Detection
