X-MLP: A Patch Embedding-Free MLP Architecture for Vision
Xinyue Wang, Zhicheng Cai, Chenglei Peng

TL;DR
X-MLP introduces a patch embedding-free MLP architecture for vision tasks, leveraging fully connected layers to independently and alternately interact across width, height, and channel dimensions, achieving superior performance and capturing long-range dependencies.
Contribution
It presents a novel vision MLP architecture that eliminates the need for patch embedding, using fully connected layers to effectively model spatial and channel interactions.
Findings
Outperforms existing vision MLP models on ten benchmark datasets.
Surpasses CNNs in accuracy across various datasets.
Effectively captures long-range pixel dependencies through mathematical visualization.
Abstract
Convolutional neural networks (CNNs) and vision transformers (ViT) have obtained great achievements in computer vision. Recently, the research of multi-layer perceptron (MLP) architectures for vision have been popular again. Vision MLPs are designed to be independent from convolutions and self-attention operations. However, existing vision MLP architectures always depend on convolution for patch embedding. Thus we propose X-MLP, an architecture constructed absolutely upon fully connected layers and free from patch embedding. It decouples the features extremely and utilizes MLPs to interact the information across the dimension of width, height and channel independently and alternately. X-MLP is tested on ten benchmark datasets, all obtaining better performance than other vision MLP models. It even surpasses CNNs by a clear margin on various dataset. Furthermore, through mathematically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies
MethodsConvolution
