Prism: Spectral-Aware Block-Sparse Attention
Xinghao Wang, Pengyu Wang, Xiaoran Liu, Fangxu Liu, Jason Chu, Kai Song, Xipeng Qiu

TL;DR
Prism is a spectral-aware block-sparse attention method that improves block importance estimation in long-context LLMs, achieving up to 5.1x speedup without accuracy loss by addressing the limitations of mean pooling and Rotary Positional Embeddings.
Contribution
Prism introduces a spectral-aware, training-free approach that decomposes block selection into frequency components and calibrates positional signals for efficient, accurate block importance estimation.
Findings
Maintains accuracy parity with full attention.
Achieves up to 5.1x speedup in block-sparse attention.
Addresses mean pooling limitations in high-frequency signal preservation.
Abstract
Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-grained attention as a proxy for block importance estimation, but often resort to expensive token-level searching or scoring, resulting in significant selection overhead. In this work, we trace the inaccuracy of standard coarse-grained attention via mean pooling to a theoretical root cause: the interaction between mean pooling and Rotary Positional Embeddings (RoPE). We prove that mean pooling acts as a low-pass filter that induces destructive interference in high-frequency dimensions, effectively creating a "blind spot" for local positional information (e.g., slash patterns). To address this, we introduce Prism, a training-free spectral-aware approach that decomposes block selection into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques
