PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers

Haopeng Li; Shitong Shao; Wenliang Zhong; Zikai Zhou; Lichen Bai; Hui Xiong; Zeke Xie

arXiv:2602.01077·cs.CV·February 4, 2026

PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers

Haopeng Li, Shitong Shao, Wenliang Zhong, Zikai Zhou, Lichen Bai, Hui Xiong, Zeke Xie

PDF

Open Access

TL;DR

PISA introduces a novel piecewise sparse attention mechanism that approximates non-critical attention scores, significantly improving efficiency in diffusion transformers while maintaining high quality in image and video generation.

Contribution

It proposes a training-free, full-span sparse attention method using exact and approximate computations, bridging the gap between speed and quality in diffusion transformers.

Findings

01

Achieves up to 2.57x speedup on video models.

02

Maintains high quality comparable to full attention.

03

Provides acceleration in image generation without quality loss.

Abstract

Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates computation by attending only critical key-value blocks, it suffers from degradation at high sparsity by discarding context. In this work, we discover that attention scores of non-critical blocks exhibit distributional stability, allowing them to be approximated accurately and efficiently rather than discarded, which is essentially important for sparse attention design. Motivated by this key insight, we propose PISA, a training-free Piecewise Sparse Attention that covers the full attention span with sub-quadratic complexity. Unlike the conventional keep-or-drop paradigm that directly drop the non-critical block information, PISA introduces a novel exact-or-approximate strategy: it maintains exact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Image Enhancement Techniques · Image and Video Quality Assessment