MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision   Models

Chenglin Yang; Siyuan Qiao; Qihang Yu; Xiaoding Yuan; Yukun Zhu; Alan; Yuille; Hartwig Adam; Liang-Chieh Chen

arXiv:2210.01820·cs.CV·February 1, 2023·22 cites

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models

Chenglin Yang, Siyuan Qiao, Qihang Yu, Xiaoding Yuan, Yukun Zhu, Alan, Yuille, Hartwig Adam, Liang-Chieh Chen

PDF

Open Access 2 Repos 3 Models 1 Video

TL;DR

MOAT introduces a novel neural network architecture that seamlessly combines mobile convolution and attention mechanisms, achieving state-of-the-art results on vision benchmarks with improved efficiency and versatility.

Contribution

The paper proposes MOAT, a new network design merging mobile convolution with attention, replacing traditional stacking, leading to enhanced performance and simplified architecture.

Findings

01

Achieves 89.1% top-1 accuracy on ImageNet-1K

02

Surpasses several mobile transformer models on ImageNet

03

Effective for downstream tasks like detection and segmentation

Abstract

This paper presents MOAT, a family of neural networks that build on top of MObile convolution (i.e., inverted residual blocks) and ATtention. Unlike the current works that stack separate mobile convolution and transformer blocks, we effectively merge them into a MOAT block. Starting with a standard Transformer block, we replace its multi-layer perceptron with a mobile convolution block, and further reorder it before the self-attention operation. The mobile convolution block not only enhances the network representation capacity, but also produces better downsampled features. Our conceptually simple MOAT networks are surprisingly effective, achieving 89.1% / 81.5% top-1 accuracy on ImageNet-1K / ImageNet-1K-V2 with ImageNet22K pretraining. Additionally, MOAT can be seamlessly applied to downstream tasks that require large resolution inputs by simply converting the global attention to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Softmax · Byte Pair Encoding · Convolution · Adam · Dense Connections