Separable Self-attention for Mobile Vision Transformers

Sachin Mehta; Mohammad Rastegari

arXiv:2206.02680·cs.CV·June 7, 2022·182 cites

Separable Self-attention for Mobile Vision Transformers

Sachin Mehta, Mohammad Rastegari

PDF

Open Access 5 Repos 10 Models

TL;DR

This paper introduces a separable self-attention mechanism with linear complexity for mobile vision transformers, significantly improving efficiency and performance on mobile vision tasks.

Contribution

It proposes a novel separable self-attention method with linear complexity, enabling faster and more resource-efficient mobile vision transformers.

Findings

01

MobileViTv2 achieves 75.6% top-1 accuracy on ImageNet.

02

MobileViTv2 runs 3.2 times faster on mobile devices.

03

MobileViTv2 outperforms MobileViT by about 1% accuracy.

Abstract

Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires $O (k^{2})$ time complexity with respect to the number of tokens (or patches) $k$ . Moreover, MHA requires costly operations (e.g., batch-wise matrix multiplication) for computing self-attention, impacting latency on resource-constrained devices. This paper introduces a separable self-attention method with linear complexity, i.e. $O (k)$ . A simple yet effective characteristic of the proposed method is that it uses element-wise operations for computing self-attention, making it a good choice for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Solar Radiation and Photovoltaics

MethodsMobileViTv2 · MobileViT