The Key to State Reduction in Linear Attention: A Rank-based Perspective

Philipp Nazari; T. Konstantin Rusch

arXiv:2602.04852·cs.LG·February 13, 2026

The Key to State Reduction in Linear Attention: A Rank-based Perspective

Philipp Nazari, T. Konstantin Rusch

PDF

Open Access

TL;DR

This paper investigates the low-rank structure of linear attention models, provides a theoretical analysis of its impact, and proposes a structured pruning method to significantly reduce state size with minimal performance loss.

Contribution

It offers a theoretical understanding of low-rank effects in linear attention and introduces a novel pruning approach based on QR decomposition for efficient state reduction.

Findings

01

Low effective rank amplifies query noise in linear attention.

02

Pruning key and query matrices can reduce state size by 50%.

03

Minimal performance degradation observed after pruning.

Abstract

Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the hidden state of trained linear attention models often exhibits a low-rank structure, suggesting that these models underexploit their capacity in practice. To illuminate this phenomenon, we provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise. In addition to these theoretical insights, we conjecture that the low-rank states can be substantially reduced post-training with only minimal performance degradation, yielding faster and more memory-efficient models. To this end, we propose a novel hardware-aware approach that structurally prunes key and query matrices, reducing the state size while retaining compatibility with existing CUDA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Age of Information Optimization · Multimodal Machine Learning Applications