Separable Self-attention for Mobile Vision Transformers
Sachin Mehta, Mohammad Rastegari

TL;DR
This paper introduces a separable self-attention mechanism with linear complexity for mobile vision transformers, significantly improving efficiency and performance on mobile vision tasks.
Contribution
It proposes a novel separable self-attention method with linear complexity, enabling faster and more resource-efficient mobile vision transformers.
Findings
MobileViTv2 achieves 75.6% top-1 accuracy on ImageNet.
MobileViTv2 runs 3.2 times faster on mobile devices.
MobileViTv2 outperforms MobileViT by about 1% accuracy.
Abstract
Mobile vision transformers (MobileViT) can achieve state-of-the-art performance across several mobile vision tasks, including classification and detection. Though these models have fewer parameters, they have high latency as compared to convolutional neural network-based models. The main efficiency bottleneck in MobileViT is the multi-headed self-attention (MHA) in transformers, which requires time complexity with respect to the number of tokens (or patches) . Moreover, MHA requires costly operations (e.g., batch-wise matrix multiplication) for computing self-attention, impacting latency on resource-constrained devices. This paper introduces a separable self-attention method with linear complexity, i.e. . A simple yet effective characteristic of the proposed method is that it uses element-wise operations for computing self-attention, making it a good choice for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- leondgarse/keras_cv_attention_models/tree/main/keras_cv_attention_models/mobilevittf
- rwightman/pytorch-image-modelspytorch
- IMvision12/keras-vision-modelspytorch
- https://gitlab.com/birder/birderpytorch
- mindspore-courses/External-Attention-MindSpore/blob/main/model/attention/MobileViTv2Attention.pymindspore
- 🤗kadirnar/timm_model_listmodel· ♡ 1♡ 1
- 🤗timm/mobilevitv2_050.cvnets_in1kmodel· 11k dl· ♡ 211k dl♡ 2
- 🤗timm/mobilevitv2_075.cvnets_in1kmodel· 5.6k dl5.6k dl
- 🤗timm/mobilevitv2_100.cvnets_in1kmodel· 4.2k dl· ♡ 14.2k dl♡ 1
- 🤗timm/mobilevitv2_125.cvnets_in1kmodel· 57 dl57 dl
- 🤗timm/mobilevitv2_150.cvnets_in1kmodel· 4.3k dl4.3k dl
- 🤗timm/mobilevitv2_150.cvnets_in22k_ft_in1kmodel· 190 dl190 dl
- 🤗timm/mobilevitv2_150.cvnets_in22k_ft_in1k_384model· 37 dl37 dl
- 🤗timm/mobilevitv2_175.cvnets_in1kmodel· 291 dl291 dl
- 🤗timm/mobilevitv2_175.cvnets_in22k_ft_in1kmodel· 218 dl218 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Solar Radiation and Photovoltaics
MethodsMobileViTv2 · MobileViT
