CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen, Enze Xie, Chongjian Ge, Runjian Chen, Ding Liang, Ping, Luo

TL;DR
CycleMLP introduces a versatile MLP-like architecture that handles various image sizes efficiently, surpasses existing models in dense prediction tasks, and maintains low computational complexity, making it suitable for object detection and segmentation.
Contribution
The paper proposes CycleMLP, an MLP-like architecture with linear complexity and adaptability to different image sizes, outperforming existing MLPs and Transformer models in dense visual prediction tasks.
Findings
CycleMLP outperforms Swin-Tiny by 1.3% mIoU on ADE20K.
CycleMLP achieves competitive results in object detection and segmentation.
CycleMLP demonstrates strong zero-shot robustness on ImageNet-C.
Abstract
This paper presents a simple MLP-like architecture, CycleMLP, which is a versatile backbone for visual recognition and dense predictions. As compared to modern MLP architectures, e.g., MLP-Mixer, ResMLP, and gMLP, whose architectures are correlated to image size and thus are infeasible in object detection and segmentation, CycleMLP has two advantages compared to modern approaches. (1) It can cope with various image sizes. (2) It achieves linear computational complexity to image size by using local windows. In contrast, previous MLPs have computations due to fully spatial connections. We build a family of models which surpass existing MLPs and even state-of-the-art Transformer-based models, e.g., Swin Transformer, while using fewer parameters and FLOPs. We expand the MLP-like models' applicability, making them a versatile backbone for dense prediction tasks. CycleMLP achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsAttention Is All You Need · Linear Layer · Feedforward Network · Spatial Gating Unit · Affine Operator · gMLP · Residual Multi-Layer Perceptrons · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing
