VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference

Anmin Liu; Ruixuan Yang; Huiqiang Jiang; Bin Lin; Minmin Sun; Yong Li; Chen Zhang; Tao Xie

arXiv:2603.29494·cs.CV·April 1, 2026

VecAttention: Vector-wise Sparse Attention for Accelerating Long Context Inference

Anmin Liu, Ruixuan Yang, Huiqiang Jiang, Bin Lin, Minmin Sun, Yong Li, Chen Zhang, Tao Xie

PDF

1 Repo

TL;DR

VecAttention introduces a vector-wise sparse attention method that significantly accelerates long-context video inference while maintaining accuracy, outperforming existing sparse attention approaches.

Contribution

The paper proposes VecAttention, a novel vector-wise sparse attention framework that leverages vertical-vector patterns for better efficiency and accuracy in video models.

Findings

01

Achieves 2.65× speedup over full attention.

02

Attains 1.83× speedup over state-of-the-art sparse methods.

03

Maintains comparable accuracy to full attention.

Abstract

Long-context video understanding and generation pose a significant computational challenge for Transformer-based video models due to the quadratic complexity of self-attention. While existing sparse attention methods employ coarse-grained patterns to improve efficiency, they typically incur redundant computation and suboptimal performance. To address this issue, in this paper, we propose \textbf{VecAttention}, a novel framework of vector-wise sparse attention that achieves superior accuracy-efficiency trade-offs for video models. We observe that video attention maps exhibit a strong vertical-vector sparse pattern, and further demonstrate that this vertical-vector pattern offers consistently better accuracy-sparsity trade-offs compared with existing coarse-grained sparse patterns. Based on this observation, VecAttention dynamically selects and processes only informative vertical vectors…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anminliu/VecAttention
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.