MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

Mustafa Munir; William Avery; Radu Marculescu

arXiv:2307.00395·cs.CV·July 4, 2023·1 cites

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

Mustafa Munir, William Avery, Radu Marculescu

PDF

Open Access 1 Repo 1 Models

TL;DR

MobileViG introduces a novel hybrid CNN-GNN architecture with a sparse attention mechanism, achieving state-of-the-art accuracy and speed on mobile vision tasks like image classification, object detection, and segmentation.

Contribution

The paper proposes the first hybrid CNN-GNN model for mobile vision, utilizing a new sparse attention mechanism (SVGA) to improve efficiency and accuracy.

Findings

01

MobileViG-Ti achieves 75.7% top-1 accuracy on ImageNet-1K.

02

MobileViG-B attains 82.6% top-1 accuracy with 2.30 ms latency.

03

MobileViG outperforms existing mobile CNN and ViG models in speed and accuracy.

Abstract

Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sldgroup/mobilevig
pytorchOfficial

Models

🤗
SLDGroup/MobileViG
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Visual Attention and Saliency Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings