Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu, Yinghui Li, Linmi Tao, Dun Liang, Hai-Tao Zheng

TL;DR
This survey examines whether deep MLP models could represent a new paradigm in computer vision, analyzing their connections, advantages, limitations, and recent variants in the context of current GPU-based methods.
Contribution
It provides a comprehensive comparison of convolution, self-attention, and Token-mixing MLP, and discusses future directions for MLP-based paradigms in vision tasks.
Findings
Token-mixing MLP has unique advantages and limitations.
Recent MLP variants show promising architecture and application diversity.
Current mainstreams include convolution, self-attention, and MLP in GPU era.
Abstract
Recently, the proposed deep MLP models have stirred up a lot of interest in the vision community. Historically, the availability of larger datasets combined with increased computing capacity leads to paradigm shifts. This review paper provides detailed discussions on whether MLP can be a new paradigm for computer vision. We compare the intrinsic connections and differences between convolution, self-attention mechanism, and Token-mixing MLP in detail. Advantages and limitations of Token-mixing MLP are provided, followed by careful analysis of recent MLP-like variants, from module design to network architecture, and their applications. In the GPU era, the locally and globally weighted summations are the current mainstreams, represented by the convolution and self-attention mechanism, as well as MLP. We suggest the further development of paradigm to be considered alongside the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Softmax · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings
