SPLA: Block Sparse Plus Linear Attention for Long Context Modeling

Bailin Wang; Dan Friedman; Tao Lei; Chong Wang

arXiv:2601.22379·cs.CL·February 2, 2026

SPLA: Block Sparse Plus Linear Attention for Long Context Modeling

Bailin Wang, Dan Friedman, Tao Lei, Chong Wang

PDF

Open Access

TL;DR

SPLA introduces a block sparse plus linear attention framework that efficiently models long contexts by accurately selecting relevant blocks and compressing unselected ones, improving performance on long-context benchmarks.

Contribution

It proposes a novel selection metric based on second-order Taylor expansions and an optimized residual linear attention method to enhance long-context modeling efficiency.

Findings

01

Outperforms dense attention models on long-context benchmarks like RULER.

02

Maintains competitive general knowledge and reasoning capabilities.

03

Reduces IO overhead with an optimized subtraction-based RLA formulation.

Abstract

Block-wise sparse attention offers significant efficiency gains for long-context modeling, yet existing methods often suffer from low selection fidelity and cumulative contextual loss by completely discarding unselected blocks. To address these limitations, we introduce Sparse Plus Linear Attention (SPLA), a framework that utilizes a selection metric derived from second-order Taylor expansions to accurately identify relevant blocks for exact attention. Instead of discarding the remaining "long tail," SPLA compresses unselected blocks into a compact recurrent state via a residual linear attention (RLA) module. Crucially, to avoid IO overhead, we derive an optimized subtraction-based formulation for RLA -- calculating the residual as the difference between global and selected linear attention -- ensuring that unselected blocks are never explicitly accessed during inference. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)