Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization
Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski,, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk

TL;DR
This paper introduces a vector quantization-based post-training quantization method for large text-to-image diffusion models, achieving higher compression rates around 3 bits while maintaining image quality and textual alignment.
Contribution
It demonstrates that vector quantization can effectively compress billion-parameter diffusion models to 3 bits, outperforming traditional scalar quantization methods.
Findings
Models compressed to 3 bits retain similar quality to 4-bit versions.
Vector quantization achieves higher compression rates for large diffusion models.
The approach is tailored for billion-scale models like SDXL and SDXL-Turbo.
Abstract
Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resource-limited environments. Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations. Recent diffusion quantization techniques primarily rely on uniform scalar quantization, providing decent performance for the models compressed to 4 bits. This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models. Specifically, we tailor vector-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques
MethodsDiffusion
