RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference

Siran Liu; Guoxia Wang; Sa Wang; Jinle Zeng; HaoYang Xie; Siyu Lou; JiaBin Yang; DianHai Yu; Haifeng Wang; Chao Yang

arXiv:2602.05853·cs.CL·February 6, 2026

RRAttention: Dynamic Block Sparse Attention via Per-Head Round-Robin Shifts for Long-Context Inference

Siran Liu, Guoxia Wang, Sa Wang, Jinle Zeng, HaoYang Xie, Siyu Lou, JiaBin Yang, DianHai Yu, Haifeng Wang, Chao Yang

PDF

Open Access

TL;DR

RRAttention introduces a dynamic sparse attention mechanism with a round-robin sampling strategy that maintains query independence, reduces computational complexity, and achieves near full attention performance on long-context tasks.

Contribution

The paper proposes RRAttention, a novel dynamic sparse attention method that combines global pattern discovery, query independence, and efficiency through a head round-robin sampling strategy.

Findings

01

Achieves over 99% of full attention performance.

02

Reduces complexity from O(L^2) to O(L^2/S^2).

03

Provides 2.4× speedup at 128K context length.

Abstract

The quadratic complexity of attention mechanisms poses a critical bottleneck for large language models processing long contexts. While dynamic sparse attention methods offer input-adaptive efficiency, they face fundamental trade-offs: requiring preprocessing, lacking global evaluation, violating query independence, or incurring high computational overhead. We present RRAttention, a novel dynamic sparse attention method that simultaneously achieves all desirable properties through a head \underline{r}ound-\underline{r}obin (RR) sampling strategy. By rotating query sampling positions across attention heads within each stride, RRAttention maintains query independence while enabling efficient global pattern discovery with stride-level aggregation. Our method reduces complexity from $O (L^{2})$ to $O (L^{2} / S^{2})$ and employs adaptive Top- $τ$ selection for optimal sparsity. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling