AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

Yu Zhang; Dong Guo; Fang Wu; Guoliang Zhu; Dian Ding; Yiming Zhang

arXiv:2505.23520·cs.LG·May 30, 2025

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity

Yu Zhang, Dong Guo, Fang Wu, Guoliang Zhu, Dian Ding, Yiming Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

AnchorAttention introduces a difference-aware, stripe-granularity sparse attention mechanism that enhances efficiency and accuracy in large language models by identifying critical attention regions more precisely.

Contribution

It proposes a novel difference-aware sparse attention method with stripe granularity, improving speed and recall over previous approaches in long-context language modeling.

Findings

01

Achieves 1.44× speedup at 128k token length

02

Maintains higher recall rates than previous methods

03

Reduces computation time significantly

Abstract

Large Language Models (LLMs) with extended context lengths face significant computational challenges during the pre-filling phase, primarily due to the quadratic complexity of self-attention. Existing methods typically employ dynamic pattern matching and block-sparse low-level implementations. However, their reliance on local information for pattern identification fails to capture global contexts, and the coarse granularity of blocks leads to persistent internal sparsity, resulting in suboptimal accuracy and efficiency. To address these limitations, we propose \textbf{AnchorAttention}, a difference-aware, dynamic sparse attention mechanism that efficiently identifies critical attention regions at a finer stripe granularity while adapting to global contextual information, achieving superior speed and accuracy. AnchorAttention comprises three key components: (1) \textbf{Pattern-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuzhouzhang9/anchor-attention
pytorchOfficial

Videos

AnchorAttention: Difference-Aware Sparse Attention with Stripe Granularity· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsSoftmax · Attention Is All You Need · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sparse Evolutionary Training