TL;DR
Q-ARVD introduces a novel quantization framework specifically designed for autoregressive video diffusion models, addressing unique challenges like frame sensitivity and outliers to enable efficient real-time video generation.
Contribution
It proposes the first tailored quantization method for ARVDs, improving efficiency while maintaining high-quality video generation through innovative sensitivity handling and outlier management.
Findings
Q-ARVD outperforms existing quantization schemes on ARVDs.
The method effectively manages outliers and frame sensitivity issues.
Significant reduction in inference cost demonstrated.
Abstract
Autoregressive video diffusion models (ARVDs) have emerged as a promising architecture for streaming video generation, paving the way for real-time interactive video generation and world modeling. Despite their potential, the substantial inference cost of ARVDs remains a major obstacle to practical deployment, making model quantization a natural direction for improving efficiency. However, quantization for ARVDs remains largely unexplored. Our empirical analysis shows that directly applying existing quantization schemes developed for standard diffusion transformers to ARVDs leads to suboptimal performance, revealing quantization behaviors that differ from those observed in bidirectional diffusion models. In this paper, we identify two critical challenges in quantizing ARVDs: (C1) Highly unbalanced frame-wise quantization sensitivity. Error accumulation during autoregressive generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
