PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers
Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Hui Xiong, Zeke Xie

TL;DR
PISA introduces a novel piecewise sparse attention mechanism that approximates non-critical attention scores, significantly improving efficiency in diffusion transformers while maintaining high quality in image and video generation.
Contribution
It proposes a training-free, full-span sparse attention method using exact and approximate computations, bridging the gap between speed and quality in diffusion transformers.
Findings
Achieves up to 2.57x speedup on video models.
Maintains high quality comparable to full attention.
Provides acceleration in image generation without quality loss.
Abstract
Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distributional stability, allowing them to be approximated accurately and efficiently rather than discarded, which is essentially important for sparse attention design. Motivated by this key insight, we propose PISA, a training-free Piecewise Sparse Attention that covers the full attention span with sub-quadratic complexity. Unlike the conventional keep-or-drop paradigm that directly drop the non-critical block information, PISA introduces a novel exact-or-approximate strategy: it maintains exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Image and Video Quality Assessment
