S$^2$-MLPv2: Improved Spatial-Shift MLP Architecture for Vision
Tan Yu, Xu Li, Yunfeng Cai, Mingming Sun, Ping Li

TL;DR
S$^2$-MLPv2 enhances the spatial-shift MLP architecture for vision tasks by expanding features, applying split operations, and using a pyramid structure, achieving state-of-the-art accuracy without self-attention.
Contribution
The paper introduces S$^2$-MLPv2, an improved vision backbone that incorporates feature expansion, split-attention, and pyramid structures for better image recognition performance.
Findings
Achieves 83.6% top-1 accuracy on ImageNet-1K.
Outperforms previous MLP-based models without self-attention.
Uses 55M parameters for a competitive medium-scale model.
Abstract
Recently, MLP-based vision backbones emerge. MLP-based vision architectures with less inductive bias achieve competitive performance in image recognition compared with CNNs and vision Transformers. Among them, spatial-shift MLP (S-MLP), adopting the straightforward spatial-shift operation, achieves better performance than the pioneering works including MLP-mixer and ResMLP. More recently, using smaller patches with a pyramid structure, Vision Permutator (ViP) and Global Filter Network (GFNet) achieve better performance than S-MLP. In this paper, we improve the S-MLP vision backbone. We expand the feature map along the channel dimension and split the expanded feature map into several parts. We conduct different spatial-shift operations on split parts. Meanwhile, we exploit the split-attention operation to fuse these split parts. Moreover, like the counterparts, we adopt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Advanced Neural Network Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Feedforward Network · Affine Operator · Residual Multi-Layer Perceptrons · Average Pooling · Dropout · Global Average Pooling · Layer Normalization · Dense Connections · Residual Connection
