gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go, Hideyuki Tachibana

TL;DR
gSwin is a novel vision model that combines Swin Transformer and gMLP, achieving improved accuracy and smaller size across multiple vision tasks by integrating hierarchical structure and locality.
Contribution
This paper introduces gSwin, a new model merging Swin Transformer and gMLP, enhancing efficiency and performance in image recognition tasks.
Findings
Outperforms Swin Transformer in accuracy on multiple tasks
Achieves smaller model size while maintaining high performance
Effective integration of transformer and MLP architectures
Abstract
Following the success in language domain, the self-attention mechanism (transformer) is adopted in the vision domain and achieving great success recently. Additionally, as another stream, multi-layer perceptron (MLP) is also explored in the vision domain. These architectures, other than traditional CNNs, have been attracting attention recently, and many methods have been proposed. As one that combines parameter efficiency and performance with locality and hierarchy in image recognition, we propose gSwin, which merges the two streams; Swin Transformer and (multi-head) gMLP. We showed that our gSwin can achieve better accuracy on three vision tasks, image classification, object detection and semantic segmentation, than Swin Transformer, with smaller model size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Digital Imaging for Blood Diseases · Advanced Neural Network Applications
MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Stochastic Depth · Layer Normalization · Byte Pair Encoding · Dense Connections · Residual Connection · Multi-Head Attention · Spatial Gating Unit
