Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu,, Chang Xu, Yunhe Wang

TL;DR
Hire-MLP introduces a hierarchical rearrangement approach for vision MLPs, capturing local and global information effectively, leading to competitive performance across multiple vision tasks with improved flexibility and efficiency.
Contribution
The paper proposes Hire-MLP, a novel vision MLP architecture with hierarchical rearrangements for better spatial information capture and versatility as a backbone for various vision tasks.
Findings
Achieves 83.8% top-1 accuracy on ImageNet.
Surpasses previous models in object detection and segmentation.
Demonstrates effective global and local feature integration.
Abstract
Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information. Such approach withholds MLPs from getting comparable performance with their transformer-based counterparts and prevents them from becoming a general backbone for computer vision. This paper presents Hire-MLP, a simple yet competitive vision MLP architecture via \textbf{Hi}erarchical \textbf{re}arrangement, which contains two levels of rearrangements. Specifically, the inner-region rearrangement is proposed to capture local information inside a spatial region, and the cross-region rearrangement is proposed to enable information communication between different regions and capture global context by circularly shifting all tokens along spatial directions. Extensive experiments demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
MethodsAffine Operator · Feedforward Network · Residual Multi-Layer Perceptrons · Average Pooling · Dense Connections · Global Average Pooling · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Dropout · Layer Normalization
