FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion

Akide Liu; Zeyu Zhang; Zhexin Li; Xuehai Bai; Yizeng Han; Jiasheng Tang; Yuanjie Xing; Jichao Wu; Mingyang Yang; Weihua Chen; Jiahao He; Yuanyu He; Fan Wang; Gholamreza Haffari; Bohan Zhuang

arXiv:2506.04648·cs.CV·June 9, 2025

FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion

Akide Liu, Zeyu Zhang, Zhexin Li, Xuehai Bai, Yizeng Han, Jiasheng Tang, Yuanjie Xing, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang, Gholamreza Haffari, Bohan Zhuang

PDF

Open Access

TL;DR

FPSAttention introduces a training-aware co-design of FP8 quantization and sparsity tailored for video diffusion models, significantly accelerating inference while maintaining high-quality video generation through innovative joint optimization and hardware-efficient implementation.

Contribution

It presents a novel joint quantization and sparsity approach with a unified 3D granularity, denoising-aware strategies, and hardware-optimized kernels for efficient video diffusion.

Findings

01

7.09x kernel speedup for attention operations

02

4.96x end-to-end speedup for video generation

03

Maintains quality at 720p resolution

Abstract

Diffusion generative models have become the standard for producing high-quality, coherent video content, yet their slow inference speeds and high computational demands hinder practical deployment. Although both quantization and sparsity can independently accelerate inference while maintaining generation quality, naively combining these techniques in existing training-free approaches leads to significant performance degradation due to the lack of joint optimization. We introduce FPSAttention, a novel training-aware co-design of FP8 quantization and sparsity for video generation, with a focus on the 3D bi-directional attention mechanism. Our approach features three key innovations: 1) A unified 3D tile-wise granularity that simultaneously supports both quantization and sparsity; 2) A denoising step-aware strategy that adapts to the noise schedule, addressing the strong correlation between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Image and Video Quality Assessment

MethodsFocus