SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Ruibang Li; Guan Luo; Yiwei Zhang; Jin Gao; Bing Li; Weiming Hu

arXiv:2601.11164·cs.CV·January 19, 2026

SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Ruibang Li, Guan Luo, Yiwei Zhang, Jin Gao, Bing Li, Weiming Hu

PDF

Open Access

TL;DR

SoLA-Vision introduces a flexible layer-wise hybrid attention mechanism combining linear and softmax attention, achieving high accuracy with reduced computational cost in vision tasks.

Contribution

It provides an analytical comparison of linear and softmax attention and proposes a novel hybrid model with fine-grained layer-wise control for improved performance.

Findings

01

Outperforms pure linear and hybrid models on ImageNet-1K.

02

Surpasses strong baselines on dense prediction tasks.

03

Fewer softmax layers needed for high accuracy.

Abstract

Standard softmax self-attention excels in vision tasks but incurs quadratic complexity O(N^2), limiting high-resolution deployment. Linear attention reduces the cost to O(N), yet its compressed state representations can impair modeling capacity and accuracy. We present an analytical study that contrasts linear and softmax attention for visual representation learning from a layer-stacking perspective. We further conduct systematic experiments on layer-wise hybridization patterns of linear and softmax attention. Our results show that, compared with rigid intra-block hybrid designs, fine-grained layer-wise hybridization can match or surpass performance while requiring fewer softmax layers. Building on these findings, we propose SoLA-Vision (Softmax-Linear Attention Vision), a flexible layer-wise hybrid attention backbone that enables fine-grained control over how linear and softmax…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Memory and Neural Computing