Strip-MLP: Efficient Token Interaction for Vision MLP
Guiping Cao, Shengda Luo, Wenjian Huang, Xiangyuan Lan, Dongmei Jiang,, Yaowei Wang, Jianguo Zhang

TL;DR
Strip-MLP introduces a novel token interaction method with cross-strip, cross-patch, and local region modules, significantly enhancing the expressive power of MLP-based vision models, especially on small datasets.
Contribution
The paper proposes Strip-MLP, a new MLP paradigm with innovative modules that improve token interaction regardless of spatial resolution, outperforming existing models.
Findings
Achieves +2.44% Top-1 accuracy on Caltech-101
Achieves +2.16% Top-1 accuracy on CIFAR-100
Outperforms existing MLP-based models on multiple datasets.
Abstract
Token interaction operation is one of the core modules in MLP-based models to exchange and aggregate information between different spatial locations. However, the power of token interaction on the spatial dimension is highly dependent on the spatial resolution of the feature maps, which limits the model's expressive ability, especially in deep layers where the feature are down-sampled to a small spatial size. To address this issue, we present a novel method called \textbf{Strip-MLP} to enrich the token interaction power in three ways. Firstly, we introduce a new MLP paradigm called Strip MLP layer that allows the token to interact with other tokens in a cross-strip manner, enabling the tokens in a row (or column) to contribute to the information aggregations in adjacent but different strips of rows (or columns). Secondly, a \textbf{C}ascade \textbf{G}roup \textbf{S}trip \textbf{M}ixing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Strip-MLP: Efficient Token Interaction for Vision MLP· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · AI in cancer detection · Domain Adaptation and Few-Shot Learning
