An Image Patch is a Wave: Phase-Aware Vision MLP

Yehui Tang; Kai Han; Jianyuan Guo; Chang Xu; Yanxi Li; Chao Xu; Yunhe; Wang

arXiv:2111.12294·cs.CV·April 7, 2022·5 cites

An Image Patch is a Wave: Phase-Aware Vision MLP

Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe, Wang

PDF

Open Access 5 Repos

TL;DR

This paper introduces Wave-MLP, a novel vision architecture that models image tokens as wave functions with amplitude and phase, enabling dynamic token aggregation and improving performance across multiple vision tasks.

Contribution

The paper proposes a wave-based token representation with phase modulation in MLPs, enhancing dynamic aggregation and achieving state-of-the-art results in vision tasks.

Findings

01

Wave-MLP outperforms existing MLP architectures on image classification.

02

Wave-MLP achieves superior results in object detection.

03

Wave-MLP improves semantic segmentation performance.

Abstract

In the field of computer vision, recent works show that a pure MLP architecture mainly stacked by fully-connected layers can achieve competing performance with CNN and transformer. An input image of vision MLP is usually split into multiple tokens (patches), while the existing MLP models directly aggregate them with fixed weights, neglecting the varying semantic information of tokens from different images. To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase. Amplitude is the original feature and the phase term is a complex value changing according to the semantic contents of input images. Introducing the phase term can dynamically modulate the relationship between tokens and fixed weights in MLP. Based on the wave-like token representation, we establish a novel Wave-MLP architecture for vision tasks. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning