RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone

Mustafa Munir; Md Mostafijur Rahman; Radu Marculescu

arXiv:2412.10995·cs.CV·December 17, 2024

RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone

Mustafa Munir, Md Mostafijur Rahman, Radu Marculescu

PDF

Open Access 1 Repo 1 Models

TL;DR

RapidNet introduces multi-level dilated convolutions to create a purely CNN-based mobile backbone that surpasses state-of-the-art models in accuracy and speed for various vision tasks on mobile devices.

Contribution

This work proposes a novel multi-level dilated convolution approach for CNNs, enabling larger receptive fields and better feature interaction, leading to superior mobile vision model performance.

Findings

01

Outperforms SOTA mobile CNN, ViT, ViG, and hybrid models in accuracy and speed.

02

RapidNet-Ti achieves 76.3% top-1 accuracy on ImageNet-1K with 0.9 ms latency.

03

Pure CNN architectures can surpass hybrid and ViT models when properly designed.

Abstract

Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based hybrid models for mobile vision applications. Recently, Vision GNN (ViG) and CNN hybrid models have also been proposed for mobile vision tasks. However, all of these methods remain slower compared to pure CNN-based models. In this work, we propose Multi-Level Dilated Convolutions to devise a purely CNN-based mobile backbone. Using Multi-Level Dilated Convolutions allows for a larger theoretical receptive field than standard convolutions. Different levels of dilation also allow for interactions between the short-range and long-range features in an image. Experiments show that our proposed model outperforms state-of-the-art (SOTA) mobile CNN, ViT, ViG,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mmunir127/rapidnet-official
pytorchOfficial

Models

🤗
SLDGroup/RapidNet
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEnergy Efficient Wireless Sensor Networks

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings