Parameterization of Cross-Token Relations with Relative Positional   Encoding for Vision MLP

Zhicai Wang; Yanbin Hao; Xingyu Gao; Hao Zhang; Shuo Wang; Tingting; Mu; Xiangnan He

arXiv:2207.07284·cs.CV·September 13, 2022

Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting, Mu, Xiangnan He

PDF

1 Repo

TL;DR

This paper introduces PosMLP, a new vision MLP architecture that uses a relative positional encoding-based spatial gating unit to efficiently model cross-token relations, reducing parameters and improving performance.

Contribution

It proposes a novel positional spatial gating unit (PoSGU) leveraging relative positional encoding to enhance token relations in vision MLPs, reducing parameter complexity from quadratic to linear or constant.

Findings

01

Achieves higher accuracy on ImageNet1K with fewer parameters.

02

Reduces parameter complexity from O(N^2) to O(N) or O(1).

03

Demonstrates competitive or improved performance compared to existing models.

Abstract

Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks, and become the main competitor of CNNs and vision Transformers. They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers. However, the heavily parameterized token-mixing layers naturally lack mechanisms to capture local information and multi-granular non-local relations, thus their discriminative power is restrained. To tackle this issue, we propose a new positional spacial gating unit (PoSGU). It exploits the attention formulations used in the classical relative positional encoding (RPE), to efficiently encode the cross-token relations for token mixing. It can successfully reduce the current quadratic parameter complexity $O (N^{2})$ of vision MLPs to $O (N)$ and $O (1)$ . We experiment with two RPE mechanisms,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhicaiwww/posmlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.