TL;DR
VecAttention introduces a vector-wise sparse attention method that significantly accelerates long-context video inference while maintaining accuracy, outperforming existing sparse attention approaches.
Contribution
The paper proposes VecAttention, a novel vector-wise sparse attention framework that leverages vertical-vector patterns for better efficiency and accuracy in video models.
Findings
Achieves 2.65× speedup over full attention.
Attains 1.83× speedup over state-of-the-art sparse methods.
Maintains comparable accuracy to full attention.
Abstract
Long-context video understanding and generation pose a significant computational challenge for Transformer-based video models due to the quadratic complexity of self-attention. While existing sparse attention methods employ coarse-grained patterns to improve efficiency, they typically incur redundant computation and suboptimal performance. To address this issue, in this paper, we propose \textbf{VecAttention}, a novel framework of vector-wise sparse attention that achieves superior accuracy-efficiency trade-offs for video models. We observe that video attention maps exhibit a strong vertical-vector sparse pattern, and further demonstrate that this vertical-vector pattern offers consistently better accuracy-sparsity trade-offs compared with existing coarse-grained sparse patterns. Based on this observation, VecAttention dynamically selects and processes only informative vertical vectors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
