DCTdiff: Intriguing Properties of Image Generative Modeling in the DCT Space
Mang Ning, Mingxiao Li, Jianlin Su, Haozhe Jia, Lanmiao Liu, Martin Bene\v{s}, Wenshuo Chen, Albert Ali Salah, Itir Onal Ertugrul

TL;DR
DCTdiff introduces a novel diffusion model operating in the DCT frequency space, achieving superior image quality and efficiency, and provides theoretical insights linking diffusion to spectral autoregression.
Contribution
It presents DCTdiff, a new frequency-space diffusion model that outperforms pixel-based and latent diffusion models in quality and efficiency, with theoretical analysis of spectral autoregression.
Findings
DCTdiff outperforms pixel-based diffusion models in quality and training efficiency.
DCTdiff scales to 512×512 resolution without latent diffusion, with only 1/4 training cost.
Theoretical proof links image diffusion to spectral autoregression.
Abstract
This paper explores image modeling from the frequency space and introduces DCTdiff, an end-to-end diffusion generative paradigm that efficiently models images in the discrete cosine transform (DCT) space. We investigate the design space of DCTdiff and reveal the key design factors. Experiments on different frameworks (UViT, DiT), generation tasks, and various diffusion samplers demonstrate that DCTdiff outperforms pixel-based diffusion models regarding generative quality and training efficiency. Remarkably, DCTdiff can seamlessly scale up to 512512 resolution without using the latent diffusion paradigm and beats latent diffusion (using SD-VAE) with only 1/4 training cost. Finally, we illustrate several intriguing properties of DCT image modeling. For example, we provide a theoretical proof of why 'image diffusion can be seen as spectral autoregression', bridging the gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques
MethodsDiffusion · Discrete Cosine Transform
