CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Jonathan Cui; David A. Araujo; Suman Saha; Md. Faisal Kabir

arXiv:2308.13363·cs.CV·October 28, 2024

CS-Mixer: A Cross-Scale Vision MLP Model with Spatial-Channel Mixing

Jonathan Cui, David A. Araujo, Suman Saha, Md. Faisal Kabir

PDF

Open Access

TL;DR

CS-Mixer is a hierarchical Vision MLP that effectively models cross-scale spatial and channel interactions, achieving high accuracy on ImageNet with efficient computation.

Contribution

It introduces a novel cross-scale spatial-channel mixing mechanism in a hierarchical Vision MLP architecture, enhancing performance without high computational costs.

Findings

01

Achieves 83.2% top-1 accuracy on ImageNet-1k

02

Uses only 13.7 GFLOPs and 94M parameters

03

Outperforms previous Vision MLP models

Abstract

Despite their simpler information fusion designs compared with Vision Transformers and Convolutional Neural Networks, Vision MLP architectures have demonstrated strong performance and high data efficiency in recent research. However, existing works such as CycleMLP and Vision Permutator typically model spatial information in equal-size spatial regions and do not consider cross-scale spatial interactions. Further, their token mixers only model 1- or 2-axis correlations, avoiding 3-axis spatial-channel mixing due to its computational demands. We therefore propose CS-Mixer, a hierarchical Vision MLP that learns dynamic low-rank transformations for spatial-channel mixing through cross-scale local and global aggregation. The proposed methodology achieves competitive results on popular image recognition benchmarks without incurring substantially more compute. Our largest model, CS-Mixer-L,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Image Processing Techniques and Applications · CCD and CMOS Imaging Sensors