VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

TL;DR
VQ4DiT introduces a fast post-training vector quantization method for diffusion transformer models, significantly reducing model size while maintaining image generation quality, enabling deployment on edge devices.
Contribution
The paper proposes VQ4DiT, a novel vector quantization approach that calibrates both codebook and assignments, improving quantization efficiency and performance for diffusion transformers.
Findings
Quantizes DiT XL/2 model to 2-bit precision.
Achieves state-of-the-art size-performance trade-offs.
Completes quantization within 20 minutes to 5 hours.
Abstract
The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to high-definition video generation tasks, their large parameter size hinders inference on edge devices. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast post-training vector quantization method for DiTs. We found that traditional VQ methods calibrate only the codebook without calibrating the assignments. This leads to weight sub-vectors being incorrectly assigned to the same assignment, providing inconsistent gradients to the codebook and resulting in a suboptimal result. To address this challenge, VQ4DiT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Advanced Memory and Neural Computing
MethodsSparse Evolutionary Training · Diffusion
