Mixing and Shifting: Exploiting Global and Local Dependencies in Vision   MLPs

Huangjie Zheng; Pengcheng He; Weizhu Chen; Mingyuan Zhou

arXiv:2202.06510·cs.CV·February 15, 2022·6 cites

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

PDF

Open Access 2 Repos

TL;DR

This paper introduces Mix-Shift-MLP, a simple yet effective model that exploits both global and local dependencies in vision tasks by increasing local receptive fields through shifting, achieving competitive results without self-attention.

Contribution

The paper proposes Mix-Shift-MLP, a novel architecture that combines mixing and shifting techniques to capture both global and local dependencies without using self-attention.

Findings

01

Achieves 83.8% top-1 accuracy on ImageNet-1K with 85M parameters.

02

Improves performance when combined with Vision Transformers like Swin Transformer.

03

Simple implementation with competitive benchmark results.

Abstract

Token-mixing multi-layer perceptron (MLP) models have shown competitive performance in computer vision tasks with a simple architecture and relatively small computational cost. Their success in maintaining computation efficiency is mainly attributed to avoiding the use of self-attention that is often computationally heavy, yet this is at the expense of not being able to mix tokens both globally and locally. In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting. In addition to conventional mixing and shifting techniques, MS-MLP mixes both neighboring and distant tokens from fine- to coarse-grained levels and then gathers them via a shifting operation. This directly contributes to the interactions between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Advanced Neural Network Applications

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dropout · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Stochastic Depth · Residual Connection · Absolute Position Encodings