VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion   Transformers

Juncan Deng; Shuaiting Li; Zeyu Wang; Hong Gu; Kedong Xu; Kejie Huang

arXiv:2408.17131·cs.CV·September 2, 2024

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang, Hong Gu, Kedong Xu, Kejie Huang

PDF

Open Access 1 Video

TL;DR

VQ4DiT introduces a fast post-training vector quantization method for diffusion transformer models, significantly reducing model size while maintaining image generation quality, enabling deployment on edge devices.

Contribution

The paper proposes VQ4DiT, a novel vector quantization approach that calibrates both codebook and assignments, improving quantization efficiency and performance for diffusion transformers.

Findings

01

Quantizes DiT XL/2 model to 2-bit precision.

02

Achieves state-of-the-art size-performance trade-offs.

03

Completes quantization within 20 minutes to 5 hours.

Abstract

The Diffusion Transformers Models (DiTs) have transitioned the network architecture from traditional UNets to transformers, demonstrating exceptional capabilities in image generation. Although DiTs have been widely applied to high-definition video generation tasks, their large parameter size hinders inference on edge devices. Vector quantization (VQ) can decompose model weight into a codebook and assignments, allowing extreme weight quantization and significantly reducing memory usage. In this paper, we propose VQ4DiT, a fast post-training vector quantization method for DiTs. We found that traditional VQ methods calibrate only the codebook without calibrating the assignments. This leads to weight sub-vectors being incorrectly assigned to the same assignment, providing inconsistent gradients to the codebook and resulting in a suboptimal result. To address this challenge, VQ4DiT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers· underline

Taxonomy

TopicsNeural Networks and Applications · Advanced Memory and Neural Computing

MethodsSparse Evolutionary Training · Diffusion