ConvMLP: Hierarchical Convolutional MLPs for Vision
Jiachen Li, Ali Hassani, Steven Walton, Humphrey Shi

TL;DR
ConvMLP introduces a hierarchical, lightweight convolutional MLP architecture that enhances visual recognition tasks, including object detection and segmentation, with competitive accuracy and efficiency.
Contribution
It proposes a novel stage-wise convolutional MLP design that improves applicability to downstream tasks and reduces computational cost compared to prior MLP-based models.
Findings
Achieves 76.8% top-1 accuracy on ImageNet-1k with fewer parameters and MACs.
Demonstrates competitive transfer learning performance on detection and segmentation.
Offers a lightweight, stage-wise design suitable for various vision tasks.
Abstract
MLP-based architectures, which consist of a sequence of consecutive multi-layer perceptron blocks, have recently been found to reach comparable results to convolutional and transformer-based methods. However, most adopt spatial MLPs which take fixed dimension inputs, therefore making it difficult to apply them to downstream tasks, such as object detection and semantic segmentation. Moreover, single-stage designs further limit performance in other computer vision tasks and fully connected layers bear heavy computation. To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs. In particular, ConvMLP-S achieves 76.8% top-1 accuracy on ImageNet-1k with 9M parameters and 2.4G MACs (15% and 19% of MLP-Mixer-B/16, respectively). Experiments on object detection and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsRegion Proposal Network · Depthwise Convolution · Residual Connection · Dense Connections · ConvMLP · Convolution · Mask R-CNN · Feature Pyramid Network
