SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention
Ruibang Li, Guan Luo, Yiwei Zhang, Jin Gao, Bing Li, Weiming Hu

TL;DR
SoLA-Vision introduces a flexible layer-wise hybrid attention mechanism combining linear and softmax attention, achieving high accuracy with reduced computational cost in vision tasks.
Contribution
It provides an analytical comparison of linear and softmax attention and proposes a novel hybrid model with fine-grained layer-wise control for improved performance.
Findings
Outperforms pure linear and hybrid models on ImageNet-1K.
Surpasses strong baselines on dense prediction tasks.
Fewer softmax layers needed for high accuracy.
Abstract
Standard softmax self-attention excels in vision tasks but incurs quadratic complexity O(N^2), limiting high-resolution deployment. Linear attention reduces the cost to O(N), yet its compressed state representations can impair modeling capacity and accuracy. We present an analytical study that contrasts linear and softmax attention for visual representation learning from a layer-stacking perspective. We further conduct systematic experiments on layer-wise hybridization patterns of linear and softmax attention. Our results show that, compared with rigid intra-block hybrid designs, fine-grained layer-wise hybridization can match or surpass performance while requiring fewer softmax layers. Building on these findings, we propose SoLA-Vision (Softmax-Linear Attention Vision), a flexible layer-wise hybrid attention backbone that enables fine-grained control over how linear and softmax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Memory and Neural Computing
