AS-MLP: An Axial Shifted MLP Architecture for Vision
Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao

TL;DR
The paper introduces AS-MLP, a novel axial shifted MLP architecture for vision tasks that effectively captures local dependencies and achieves competitive performance on image classification and downstream tasks.
Contribution
It proposes a pure MLP architecture with axial shifting to model local features, outperforming previous MLP models and rivaling transformer-based architectures.
Findings
Achieves 83.3% Top-1 accuracy on ImageNet-1K
Outperforms all previous MLP-based architectures
Excels in downstream tasks like object detection and segmentation
Abstract
An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features interaction. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, in the same spirit of convolutional neural networks. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset. Such a simple yet effective architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Average Pooling · Global Average Pooling · Dense Connections · Layer Normalization · Dropout · MLP-Mixer · Convolution
