ExMobileViT: Lightweight Classifier Extension for Mobile Vision   Transformer

Gyeongdong Yang; Yungwook Kwon; and Hyunjin Kim

arXiv:2309.01310·cs.CV·September 6, 2023·2 cites

ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer

Gyeongdong Yang, Yungwook Kwon, and Hyunjin Kim

PDF

Open Access

TL;DR

ExMobileViT introduces a lightweight extension to mobile vision transformers that reuses early attention stage information via average pooling, significantly improving accuracy with minimal additional computational cost.

Contribution

The paper presents a novel method to enhance mobile vision transformers by leveraging early attention features, improving performance with negligible overhead.

Findings

01

Notable accuracy improvements over MobileViT on ImageNet

02

Only about 5% increase in parameters

03

Minimal additional computational overhead

Abstract

The paper proposes an efficient structure for enhancing the performance of mobile-friendly vision transformer with small computational overhead. The vision transformer (ViT) is very attractive in that it reaches outperforming results in image classification, compared to conventional convolutional neural networks (CNNs). Due to its need of high computational resources, MobileNet-based ViT models such as MobileViT-S have been developed. However, their performance cannot reach the original ViT model. The proposed structure relieves the above weakness by storing the information from early attention stages and reusing it in the final classifier. This paper is motivated by the idea that the data itself from early attention stages can have important meaning for the final classification. In order to reuse the early information in attention stages, the average pooling results of various scaled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing · Image Enhancement Techniques

MethodsAttention Is All You Need · Softmax · Linear Layer · Multi-Head Attention · Residual Connection · Average Pooling · Layer Normalization · Dense Connections · Vision Transformer