Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Ruiyang Liu; Yinghui Li; Linmi Tao; Dun Liang; Hai-Tao Zheng

arXiv:2111.04060·cs.CV·April 26, 2022

Are we ready for a new paradigm shift? A Survey on Visual Deep MLP

Ruiyang Liu, Yinghui Li, Linmi Tao, Dun Liang, Hai-Tao Zheng

PDF

Open Access 1 Repo

TL;DR

This survey examines whether deep MLP models could represent a new paradigm in computer vision, analyzing their connections, advantages, limitations, and recent variants in the context of current GPU-based methods.

Contribution

It provides a comprehensive comparison of convolution, self-attention, and Token-mixing MLP, and discusses future directions for MLP-based paradigms in vision tasks.

Findings

01

Token-mixing MLP has unique advantages and limitations.

02

Recent MLP variants show promising architecture and application diversity.

03

Current mainstreams include convolution, self-attention, and MLP in GPU era.

Abstract

Recently, the proposed deep MLP models have stirred up a lot of interest in the vision community. Historically, the availability of larger datasets combined with increased computing capacity leads to paradigm shifts. This review paper provides detailed discussions on whether MLP can be a new paradigm for computer vision. We compare the intrinsic connections and differences between convolution, self-attention mechanism, and Token-mixing MLP in detail. Advantages and limitations of Token-mixing MLP are provided, followed by careful analysis of recent MLP-like variants, from module design to network architecture, and their applications. In the GPU era, the locally and globally weighted summations are the current mainstreams, represented by the convolution and self-attention mechanism, as well as MLP. We suggest the further development of paradigm to be considered alongside the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuruiyang98/Jittor-MLP
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Convolution · Softmax · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Absolute Position Encodings