Navigating Efficiency in MobileViT through Gaussian Process on Global   Architecture Factors

Ke Meng; Kai Chen

arXiv:2406.04820·cs.CV·June 10, 2024

Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors

Ke Meng, Kai Chen

PDF

Open Access

TL;DR

This paper uses Gaussian processes to analyze and optimize MobileViT architecture factors, reducing computational costs while maintaining high accuracy across datasets.

Contribution

It introduces a systematic approach to explore architecture relationships and derive smaller, efficient MobileViT models with improved performance.

Findings

01

Outperforms CNNs and mobile ViTs on multiple datasets

02

Provides a formula for downsizing architectures under MAC constraints

03

Identifies design principles for efficient MobileViT configurations

Abstract

Numerous techniques have been meticulously designed to achieve optimal architectures for convolutional neural networks (CNNs), yet a comparable focus on vision transformers (ViTs) has been somewhat lacking. Despite the remarkable success of ViTs in various vision tasks, their heavyweight nature presents challenges of computational costs. In this paper, we leverage the Gaussian process to systematically explore the nonlinear and uncertain relationship between performance and global architecture factors of MobileViT, such as resolution, width, and depth including the depth of in-verted residual blocks and the depth of ViT blocks, and joint factors including resolution-depth and resolution-width. We present design principles twisting magic 4D cube of the global architecture factors that minimize model sizes and computational costs with higher model accuracy. We introduce a formula for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsContext-Aware Activity Recognition Systems · Mobile and Web Applications · Anomaly Detection Techniques and Applications

MethodsFocus · Gaussian Process · MobileViT