6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

Rundong Su; Jintao Zhang; Zhihang Yuan; Haojie Duanmu; Jianfei Chen; Jun Zhu

arXiv:2603.18742·cs.CV·March 20, 2026

6Bit-Diffusion: Inference-Time Mixed-Precision Quantization for Video Diffusion Models

Rundong Su, Jintao Zhang, Zhihang Yuan, Haojie Duanmu, Jianfei Chen, Jun Zhu

PDF

Open Access

TL;DR

This paper introduces a dynamic mixed-precision quantization framework and a temporal redundancy exploitation technique to significantly improve the efficiency of video diffusion models during inference, reducing memory and computation costs.

Contribution

It proposes a novel adaptive mixed-precision quantization method and a temporal delta cache mechanism for efficient video diffusion model inference.

Findings

01

Achieves 1.92× end-to-end acceleration

02

Reduces memory usage by 3.32×

03

Maintains high generation quality

Abstract

Diffusion transformers have demonstrated remarkable capabilities in generating videos. However, their practical deployment is severely constrained by high memory usage and computational cost. Post-Training Quantization provides a practical way to reduce memory usage and boost computation speed. Existing quantization methods typically apply a static bit-width allocation, overlooking the quantization difficulty of activations across diffusion timesteps, leading to a suboptimal trade-off between efficiency and quality. In this paper, we propose a inference time NVFP4/INT8 Mixed-Precision Quantization framework. We find a strong linear correlation between a block's input-output difference and the quantization sensitivity of its internal linear layers. Based on this insight, we design a lightweight predictor that dynamically allocates NVFP4 to temporally stable layers to maximize memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Video Quality Assessment · Video Coding and Compression Technologies · Advanced Data Compression Techniques