Hardware-Friendly Static Quantization Method for Video Diffusion Transformers
Sanghyun Yi, Qingfeng Liu, Mostafa El-Khamy

TL;DR
This paper introduces a static post-training quantization method for Video Diffusion Transformers, enabling efficient deployment on resource-constrained devices without sacrificing video quality.
Contribution
It presents a novel static quantization approach for Video Diffusion Transformers that matches dynamic quantization performance using per-step calibration and smooth-quantization techniques.
Findings
Static quantization achieves comparable video quality to FP16 and dynamic methods.
Per-step calibration data improves quantization accuracy.
The method enables efficient deployment on resource-limited devices.
Abstract
Diffusion Transformers for video generation have gained significant research interest since the impressive performance of SORA. Efficient deployment of such generative-AI models on GPUs has been demonstrated with dynamic quantization. However, resource-constrained devices cannot support dynamic quantization, and need static quantization of the models for their efficient deployment on AI processors. In this paper, we propose a novel method for the post-training quantization of OpenSora\cite{opensora}, a Video Diffusion Transformer, without relying on dynamic quantization techniques. Our approach employs static quantization, achieving video quality comparable to FP16 and dynamically quantized ViDiT-Q methods, as measured by CLIP, and VQA metrics. In particular, we utilize per-step calibration data to adequately provide a post-training statically quantized model for each time step,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Absolute Position Encodings · Residual Connection · Adam · Layer Normalization · Contrastive Language-Image Pre-training · Label Smoothing
