Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

Sanghyun Yi; Qingfeng Liu; Mostafa El-Khamy

arXiv:2502.15077·cs.CV·June 18, 2025

Hardware-Friendly Static Quantization Method for Video Diffusion Transformers

Sanghyun Yi, Qingfeng Liu, Mostafa El-Khamy

PDF

TL;DR

This paper introduces a static post-training quantization method for Video Diffusion Transformers, enabling efficient deployment on resource-constrained devices without sacrificing video quality.

Contribution

It presents a novel static quantization approach for Video Diffusion Transformers that matches dynamic quantization performance using per-step calibration and smooth-quantization techniques.

Findings

01

Static quantization achieves comparable video quality to FP16 and dynamic methods.

02

Per-step calibration data improves quantization accuracy.

03

The method enables efficient deployment on resource-limited devices.

Abstract

Diffusion Transformers for video generation have gained significant research interest since the impressive performance of SORA. Efficient deployment of such generative-AI models on GPUs has been demonstrated with dynamic quantization. However, resource-constrained devices cannot support dynamic quantization, and need static quantization of the models for their efficient deployment on AI processors. In this paper, we propose a novel method for the post-training quantization of OpenSora\cite{opensora}, a Video Diffusion Transformer, without relying on dynamic quantization techniques. Our approach employs static quantization, achieving video quality comparable to FP16 and dynamically quantized ViDiT-Q methods, as measured by CLIP, and VQA metrics. In particular, we utilize per-step calibration data to adequately provide a post-training statically quantized model for each time step,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Dense Connections · Absolute Position Encodings · Residual Connection · Adam · Layer Normalization · Contrastive Language-Image Pre-training · Label Smoothing