Softmax Linear Attention: Reclaiming Global Competition
Mingwei Xu, Xuan Lin, Xinnan Guo, Wanqing Xu, Wanyun Cui

TL;DR
This paper introduces Softmax Linear Attention (SLA), a novel framework that restores global competition in linear attention models by lifting softmax normalization to the head level, improving focus and robustness in long-context tasks.
Contribution
SLA reintroduces softmax-based competition at the head level in linear attention, enhancing expressivity and focus without increasing computational complexity.
Findings
SLA improves state-of-the-art linear models on language and long-context benchmarks.
SLA significantly boosts robustness in retrieval scenarios with noise.
SLA maintains linear complexity while enhancing focus and expressivity.
Abstract
While linear attention reduces the quadratic complexity of standard Transformers to linear time, it often lags behind in expressivity due to the removal of softmax normalization. This omission eliminates \emph{global competition}, a critical mechanism that enables models to sharply focus on relevant information amidst long-context noise. In this work, we propose \textbf{Softmax Linear Attention (SLA)}, a framework designed to restore this competitive selection without sacrificing efficiency. By lifting the softmax operation from the token level to the head level, SLA leverages attention heads as coarse semantic slots, applying a competitive gating mechanism to dynamically select the most relevant subspaces. This reintroduces the ``winner-take-all'' dynamics essential for precise retrieval and robust long-context understanding. Distinct from prior methods that focus on refining local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
