S2AFormer: Strip Self-Attention for Efficient Vision Transformer

Guoan Xu; Wenfeng Huang; Wenjing Jia; Jiamao Li; Guangwei Gao; Guo-Jun Qi

arXiv:2505.22195·cs.CV·December 1, 2025

S2AFormer: Strip Self-Attention for Efficient Vision Transformer

Guoan Xu, Wenfeng Huang, Wenjing Jia, Jiamao Li, Guangwei Gao, Guo-Jun Qi

PDF

Open Access

TL;DR

S2AFormer introduces a novel Strip Self-Attention mechanism that reduces computational complexity in Vision Transformers, combining CNN local perception with global attention, leading to efficient and accurate vision models.

Contribution

The paper proposes S2AFormer with Strip Self-Attention, a new method that significantly reduces computation while maintaining accuracy in Vision Transformers.

Findings

01

Achieves higher accuracy on ImageNet-1k with less computation.

02

Demonstrates superior efficiency on multiple vision benchmarks.

03

Maintains robustness across different hardware environments.

Abstract

Vision Transformer (ViT) has made significant advancements in computer vision, thanks to its token mixer's sophisticated ability to capture global dependencies between all tokens. However, the quadratic growth in computational demands as the number of tokens increases limits its practical efficiency. Although recent methods have combined the strengths of convolutions and self-attention to achieve better trade-offs, the expensive pairwise token affinity and complex matrix operations inherent in self-attention remain a bottleneck. To address this challenge, we propose S2AFormer, an efficient Vision Transformer architecture featuring novel Strip Self-Attention (SSA). We design simple yet effective Hybrid Perception Blocks (HPBs) to effectively integrate the local perception capabilities of CNNs with the global context modeling of Transformer's attention mechanisms. A key innovation of SSA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices