Convolutional Gated MLP: Combining Convolutions & gMLP
A.Rajagopal, V. Nirmala

TL;DR
This paper introduces Convolutional Gated MLP (CgMLP), a novel deep learning architecture that combines convolutions with gMLP, demonstrating improved generalization over gMLP without attention mechanisms.
Contribution
It is the first to integrate convolutions into gMLP, providing a new architecture and visualizations of its learning process, with experimental validation on CIFAR.
Findings
CgMLP outperforms gMLP in generalization on CIFAR.
CgMLP avoids overfitting compared to gMLP.
Visualizations reveal how CgMLP learns features like object outlines.
Abstract
To the best of our knowledge, this is the first paper to introduce Convolutions to Gated MultiLayer Perceptron and contributes an implementation of this novel Deep Learning architecture. Google Brain introduced the gMLP in May 2021. Microsoft introduced Convolutions in Vision Transformer in Mar 2021. Inspired by both gMLP and CvT, we introduce convolutional layers in gMLP. CvT combined the power of Convolutions and Attention. Our implementation combines the best of Convolutional learning along with spatial gated MLP. Further, the paper visualizes how CgMLP learns. Visualizations show how CgMLP learns from features such as outline of a car. While Attention was the basis of much of recent progress in Deep Learning, gMLP proposed an approach that doesn't use Attention computation. In Transformer based approaches, a whole lot of Attention matrixes need to be learnt using vast amount of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Depthwise Convolution · Pointwise Convolution · Dropout · Residual Connection · Batch Normalization · Dense Connections
