The Key to State Reduction in Linear Attention: A Rank-based Perspective
Philipp Nazari, T. Konstantin Rusch

TL;DR
This paper investigates the low-rank structure of linear attention models, provides a theoretical analysis of its impact, and proposes a structured pruning method to significantly reduce state size with minimal performance loss.
Contribution
It offers a theoretical understanding of low-rank effects in linear attention and introduces a novel pruning approach based on QR decomposition for efficient state reduction.
Findings
Low effective rank amplifies query noise in linear attention.
Pruning key and query matrices can reduce state size by 50%.
Minimal performance degradation observed after pruning.
Abstract
Linear attention offers a computationally efficient yet expressive alternative to softmax attention. However, recent empirical results indicate that the hidden state of trained linear attention models often exhibits a low-rank structure, suggesting that these models underexploit their capacity in practice. To illuminate this phenomenon, we provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise. In addition to these theoretical insights, we conjecture that the low-rank states can be substantially reduced post-training with only minimal performance degradation, yielding faster and more memory-efficient models. To this end, we propose a novel hardware-aware approach that structurally prunes key and query matrices, reducing the state size while retaining compatibility with existing CUDA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Age of Information Optimization · Multimodal Machine Learning Applications
