Breaking the Low-Rank Dilemma of Linear Attention

Qihang Fan; Huaibo Huang; Ran He

arXiv:2411.07635·cs.CV·March 12, 2025

Breaking the Low-Rank Dilemma of Linear Attention

Qihang Fan, Huaibo Huang, Ran He

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Rank-Augmented Linear Attention (RALA) to overcome the low-rank limitations of linear attention, achieving performance comparable to Softmax attention in vision tasks while maintaining linear complexity.

Contribution

The paper proposes RALA, a novel linear attention method that addresses the low-rank issue, and constructs RAVLT, a vision transformer that outperforms previous linear attention models.

Findings

01

RAVLT achieves 84.4% Top-1 accuracy on ImageNet-1k.

02

RALA rivals Softmax attention performance with linear complexity.

03

The approach significantly surpasses previous linear attention mechanisms.

Abstract

The Softmax attention mechanism in Transformer models is notoriously computationally expensive, particularly due to its quadratic complexity, posing significant challenges in vision applications. In contrast, linear attention provides a far more efficient solution by reducing the complexity to linear levels. However, compared to Softmax attention, linear attention often experiences significant performance degradation. Our experiments indicate that this performance drop is due to the low-rank nature of linear attention's feature map, which hinders its ability to adequately model complex spatial information. In this paper, to break the low-rank dilemma of linear attention, we conduct rank analysis from two perspectives: the KV buffer and the output features. Consequently, we introduce Rank-Augmented Linear Attention (RALA), which rivals the performance of Softmax attention while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qhfan/rala
pytorchOfficial

Models

🤗
aldjalkdf/RAVLT
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Multi-Head Attention · Residual Connection